Author: Betsy Ladyzhets

  • Teachers can get vaccinated in every state, but we don’t know how many are

    As of this past Monday, K-12 teachers in every state are now eligible for vaccination. Teachers were already prioritized in most of the country, but Biden directed the remaining states to adjust their priority lists last week. The federal government also pulled teachers into the federal pharmacy program, previously used for long-term care facilities.

    This is great news, of course—teachers should get vaccinated ASAP so that they can safely return to their classrooms, allowing schools to reopen in person with much lower risk. Vaccinations have become a stipulation for reopening, in fact, in some states like Oregon, even though the CDC has said this should not be a requirement.

    But there’s one big problem: we have no idea how many teachers have actually been inoculated. Sarah wrote about why we need occupational data on vaccinations a few weeks ago:

    For example, NYC has included “in-person college instructors” in eligibility for the vaccine since January 11. Wouldn’t it be nice to know just how many in-person professors have gotten vaccinated? It’d sure be helpful if Barnard ever decides to do in-person classes again. Or what about taxi drivers? Again in NYC, because that’s where I live, they became eligible for vaccination on February 2. From a personal standpoint, I’d like to know if I could send my taxi driver to the hospital if my mask slips.

    The data situation hasn’t improved since February. New York’s report of vaccine coverage among state hospital workers is still the closest thing we have to occupation reporting. A recent article from EdWeek sheds some light on the issue, citing privacy concerns and a lack of data from vaccine administration sites themselves:

    Some state agencies and districts have said privacy concerns prevent them from tracking or publishing teacher vaccination data. Others say vaccine administration sites are not tracking recipients’ occupations and they are not in position to survey employees themselves.

    It appears that state and local public health departments were even less prepared to track occupations of vaccine patients than they were to track those patients’ race and ethnicity. But without these numbers, it may take even longer for students to return to classrooms, as evidenced by this quote from Megan Collins, co-director of the Johns Hopkins Consortium for School-Based Health Solutions:

    “We’re seeing a substantial disconnect. There are states not prioritizing teachers for vaccine that are fully open for in-person instruction, and others that are prioritizing teachers for vaccines, but aren’t open at all,” Collins said. “If states are going to use teacher vaccinations as a part of the process for safely returning to classrooms, it’s very important then to be able to communicate that information so people know that teachers are actually getting vaccines.”

    Biden’s administration has also given schools more money for testing, allocating $650 million in grants to help public schools get access to tests, testing supplies, and logistical assistance. But of course, school testing isn’t being tracked either. New York continues to be the only state that reports detailed data in this area; see our K-12 school data annotations for more info.

    Related posts

    • COVID-19 school data remain sporadic
      On November 18, New York City mayor Bill de Blasio announced that the city’s schools would close until further notice. The NYC schools discrepancy is indicative of an American education system that is still not collecting adequate data on how COVID-19 is impacting classrooms—much less using these data in a consistent manner.
  • Where are we most likely to catch COVID-19?

    Where are we most likely to catch COVID-19?

    This week, I wrote a story for Popular Science that goes over what we know (and don’t know) about the most common settings for COVID-19 infection.

    Most of the main points will probably be familiar to CDD readers, but it’s still useful to compile this info in one concise article. Here are the main points: Outside events are always safer. Surfaces are not a common transmission source. Communal living facilities and factories tend to be hotspots. Indoor dining and similar settings carry a lot of risk. Essential workers are called essential for a reason. And don’t rule out small gatherings, even though such events are safer for those of us who’ve been vaccinated.

    This story gave me an excuse to revisit one of my favorite COVID-19 datasets: the Superspreading Events Database, a project that compiles superspreading events from media reports, scientific papers, and public health dashboards. I interviewed Koen Swinkels, the project’s lead, for the CDD back in November.

    At that time, the database had about 1,600 events; now, it includes over 2,000. All of the patterns I wrote about in November still hold true now, though. Notably, no event in the database took place solely outside (though Swinkels told me he’s seen some events with both an indoor and outdoor component). And the vast majority of events in the database took place in the U.S.

    For those U.S. events, most common superspreading settings are prisons (166,000 cases), nursing homes (30,000 cases), rehabilitation/medical centers (24,000 cases), and meat processing plants (13,000 cases). By this database’s definition, a superspreading event may comprise a sustained outbreak at one location over a long period of time—and prisons have been continuous hotspots since last spring. 

    You can check out the U.S. superspreading events in the database below. I made this visualization in November and updated it this past week.

    One of the reasons why I like the Superspreading Events Database is that Swinkels and his collaborators are extremely clear on the project’s limitations. If you load the database’s public Google sheet, you’ll see a prominent note at the top reading, “Note that the database is NOT a representative sample of superspreading events. Please read this article for more information about the limitations of the database.” The article, a post on Swinkels’ Medium blog, goes in-depth on the biases associated with the database. It’s easier to identify superspreading events in institutional settings, for example, since many of them employ frequent testing. Still, I think that—when carefully caveated—this database is an incredibly useful resource for identifying patterns in COVID-19 spread.

    Swinkels additionally pointed me to another great source for exposure data: the state of Colorado publishes outbreak data in weekly reports. A few other states publish similar info, but Colorado’s data are highly detailed and complete. In this past week’s report, released on March 10, the state says that 6,900 out of a total 28,000 cases in active outbreaks are linked to state prisons. 3,900 more cases are linked to jails.

    I’ve visualized the March 10 Colorado outbreak data below. As you may notice, the next-biggest outbreak setting after prisons and jails is higher education—colleges and universities represent 6,700 active outbreak cases. Colorado’s dataset does not specify how many of those cases are linked to the mask-less University of Colorado party that drew wide criticism last weekend… but we can assume that party was no small player.

    Finally, this PopSci story also gave me an excuse to revisit one of my favorite COVID-19 data gripes: the lack of contact tracing info we have in the U.S. I’ve written about this issue in the CDD before; I surveyed state dashboards in October, and drew connections from the Capitol invasion in January. But it was still disheartening to find that now, in March, we continue to be largely in the dark about how many contact tracers are actively employed in most states and how many people they’re reaching.

    Here’s a clip from the story:

    In the US, though, the practice is done unevenly, if at all. Most states and local jurisdictions, struggling from years of underfunded public health departments leading up to the pandemic, have not been able to hire and train the contact tracers needed to keep tabs on every case.

    Many states have attempted to supplement their limited contact tracing workforces with exposure notification apps, which are theoretically able to notify users when they’ve come into contact with someone who tested positive. Though these apps became more widespread in the US this past winter, they’re still not used widely enough to provide useful information. New Jersey, one state that provides data on its app use, reports that about 574,000 state residents have downloaded the app as of March 6—out of a population of 8.9 million.

    This situation is not likely to improve much in the coming months as Americans aren’t about to change their perspectives on privacy any time soon. But if you have the opportunity to download an exposure notification app for your state, do it! The more data we have on where people are getting exposed to COVID-19, the better we can understand this virus.

    Related posts

    • We need better contact tracing data
      The majority of states do not collect or report detailed information on how their residents became infected with COVID-19. This type of information would come from contact tracing, in which public health workers call up COVID-19 patients to ask about their activities and close contacts. Contact tracing has been notoriously lacking in the U.S. due to limited resources and cultural pushback.
  • National Numbers, March 14

    National Numbers, March 14

    In the past week (March 6 through 12), the U.S. reported about 382,000 new cases, according to the CDC. This amounts to:

    • An average of 55,000 new cases each day
    • 117 total new cases for every 100,000 Americans
    • 1 in 858 Americans getting diagnosed with COVID-19 in the past week
    Nationwide COVID-19 metrics as of March 12, sourcing data from the CDC and HHS. Posted on Twitter by Conor Kelly.

    Last week, America also saw:

    • 34,200 new COVID-19 patients admitted to hospitals (10 for every 100,000 people)
    • 10,300 new COVID-19 deaths (3.1 for every 100,000 people)
    • An average of 2.5 million vaccinations per day (per Bloomberg)

    Note: Since the COVID Tracking Project ceased data collection last Sunday, March 7, I am now using CDC data for these updates. I’m primarily relying on the agency’s COVID Data Tracker Weekly Review reports.

    The Johnson & Johnson vaccine has fully dropped—not just in the arms of millions of Americans, but also on state vaccine dashboards. When I updated the CDD’s vaccination data annotations yesterday, I noticed that several states had switched from labeling their shots as “first dose” and “second dose” to labeling them as “first dose” and “completed series,” or something similar. Since the J&J vaccine is only one dose, a single shot from this manufacturer could launch you right into that “completed series” category.

    However you label them, the U.S. is now vaccinating about 2.5 million people per day. One in four adults has received at least their first shot. And we crossed the 100-million dose mark on Friday, far earlier than President Biden’s 100-day goal. Meanwhile, cases, deaths, and hospitalizations continue to decline.

    These numbers have inspired some guarded optimism, at least on the part of the federal government. On Thursday, Biden announced that he’s directing all states, tribes, and territories to make all adults eligible for vaccination by May 1. Alaska became the first state to reach that milestone this week.

    Of course, there’s a big difference between making people eligible and actually getting shots in arms. But vaccine hesitancy is reportedly dropping, as Americans see their family members and friends safely get inoculated. One new poll from NPR/PBS NewsHour/Marist shows that 73% of Black people and 70% of white people said they’re either planning to get vaccinated or have received a shot already.

    Variants also continue to be a concern (see Sarah’s section later in this issue). But it’s hard to argue with the fact that millions of our family members, friends, and neighbors are now protected from COVID-19, with more people getting vaccinated every day.

  • COVID source callout: Age brackets

    COVID source callout: Age brackets

    As of yesterday, 45 states and D.C. are reporting vaccinations by race and ethnicity. (See the CDD’s full annotations here.) This is great—with five more states, we’ll have national coverage. But the lack of standardization in how states report these figures leaves much to be desired.

    One of the newest states to start reporting race/ethnicity vaccination data is Minnesota. At a glance, the Race/Ethnicity tab of the state’s vaccine data dashboard looks comprehensive: it includes demographic data stratified by age, as well as a bar chart that compares the population that’s been vaccinated to Minnesota’s population overall.

    Race/Ethnicity tab on Minnesota’s vaccine dashboard, showing percent comparisons.

    But a closer examination shows that the age groups reported on this Race/Ethnicity tab (15-44, 45-64, 65+) don’t match the age groups used to report vaccinations by age on a different tab (16-17, 18-49, 50-64, 65+). So if a journalist or researcher were trying to analyze Minnesota’s vaccine demographics, they wouldn’t be able to derive whole numbers from these percentages.

    This is one small example of a common issue across state vaccine demographic reporting—and demographic reporting in general. When categories don’t match, it’s difficult to make comparisons, and age brackets are particularly heinous. We need the CDC to start providing vaccine demographics by state, like, last December.

  • Featured sources, March 7

    • Coronavirus variant data from USA TODAY: The CDC doesn’t publish a time series of its counts of COVID-19 cases caused by variants. So, USA TODAY journalists have set up a program to scrape these data whenever the CDC publishes an update and store the data in a CSV, including variant counts for every U.S. state. The time series goes back to early January.
    • Documenting COVID-19: This repository is one of several great resources brought to my attention during this past week’s NICAR conference. It’s a database of documents related to the pandemic, obtained through state open-records laws and the Freedom of Information Act (FOIA). 246 records are available as of February 26.
    • VaccinateCA API: California readers, this one’s for you. The community-driven project VaccinateCA, aimed at helping Californians get vaccinated, has made its underlying data available for researchers. The API includes data on vaccination sites and their availability across the state.

  • Diving into COVID-19 data #3: Workshop recap

    Diving into COVID-19 data #3: Workshop recap

    Our final Diving into COVID-19 data workshop happened this week!

    Ann Choi and Will Welch, two data reporters at THE CITY (a local, nonprofit newsroom in NYC) discussed making COVID-19 numbers accessible and meaningful for New Yorkers. Júlia Ledur, a graphics reporter at the Washington Post and former data visualization lead at the COVID Tracking Project, walked us through several visualization projects. And Christie Aschwanden, a freelance science journalist, discussed communicating uncertainty in COVID-19 reporting. Thank you to everyone who attended!

    For those who couldn’t make it live, you can watch the recording of the session below. You can also check out the slides here. I’m also sharing a brief recap of the workshop below.

    Making NYC data accessible

    Ann Choi and Will Welch shared a few strategies they used to communicate COVID-19 data for a NYC audience.

    First, Ann Choi walked through a few examples of THE CITY’s written articles, going from early uncertainty during a time of limited government guidance to a present focus on who is getting vaccinated.

    • One early story compares two socioeconomically similar neighborhoods in Queens, Flushing and Corona, which had different pandemic responses. For example, Flushing residents, many of whom are East Asian immigrants, did not need to be told to wear masks because they had past experience with these public health measures. Choi said she was inspired to do this comparison by thinking about how New York City measures up against Seoul, South Korea, where she has relatives. The cities have similarly-sized populations, though Seoul is much more dense; yet NYC has had over 29,000 COVID-19 deaths while Seoul has had fewer than 400.
    • A January story on vaccination rates emphasizes the “if”s by extrapolating out from NYC’s data. “If city residents were getting vaccinated at the statewide average,” Choi and Welch wrote, over 100,000 more city residents would already have gotten a shot. This tangible comparison emphasizes how the city is lagging behind.
    • A story on the Washington Heights Armory drew attention to vaccination disparities in the city. THE CITY reporter Josefa Velasquez went to this site and did what Choi called “very old-school data reporting,” talking to a sample of people waiting in line. Despite the vaccination site being touted by the governor as an example of a vaccination site in a predominantly Hispanic/Latino neighborhood, the majority of those Velasquez spoke to were white suburbanites. After this story was published, appointments at the Armory were restricted to prioritize NYC residents.
    • Other vaccination stories Choi’s worked on have drawn further distinctions between which neighborhoods were most impacted by the pandemic—and which neighborhoods are getting the most shots. ZIP code data released by the city allowed her to drill down into local patterns and find both examples of inequity and examples of how communities fought against it.

    In assessing socioeconomic traits of a neighborhood, Choi recommends using the Census Reporter website or the tidycensus package in R.

    Will Welch then discussed THE CITY’s COVID-19 tracker, Coronavirus in New York City. It’s updated on a daily basis through a combination of scraping and hand data collection; the tracker includes data on cases, deaths, test positivity, vaccinations, and more.

    “Our first iteration of this tracker was taking the data from the city and trying to put it into a more accessible format,” Welch said. City Hall emailed data out to reporters starting in mid-March, before developing a public dashboard.

    Later in the spring, NYC began publishing data on a public GitHub, allowing Welch and Choi to evolve their tracker beyond simply making numbers easier to find. One example of complexity built into the tracker: NYC was one of the first jurisdictions to distinguish “confirmed” deaths (associated with positive tests) and “probable” deaths (associated with death certificates). At first, daily updates of these numbers led to confusion as probable deaths would sometimes be reassigned as “confirmed.” But when the city published a full time series on deaths, THE CITY was able to make their own time series graphic, showing the long-term arc of the pandemic.

    Informed by watching this time series, Welch put together a graphic to show how early counts of COVID-19 fatalities in NYC by date of death are often incomplete. Deaths may be retroactively assigned to an earlier date of death after they were reported, which put some summer dates that Governor Cuomo bragged had seen “no new deaths” into a different light.

    Welch additionally discussed how the tracker switched, in the summer, from relying on city data to relying on state data for testing numbers. State data were being used to calculate reopening indicators, and THE CITY wanted their tracker to display the information that was having the greatest impact on people’s lives. The publication also fought for the release of ZIP code level data, which were used for specific local policy restrictions starting in the fall.

    Pandemic impact on New Yorkers’ lives has been an enduring theme for THE CITY’s coverage over the past year. This ranges from mapping the test positivity numbers behind restrictions to highlighting the share of NYC residents that have gotten at least one dose of a vaccine. But one particularly notable project is Missing Them, a searchable memorial of New Yorkers whose lives have been lost to COVID-19. The memorial has compiled nearly 1,000 names so far, though this is less than 5% of the total number of lives lost.

    “Behind every number, behind every case, there is a vulnerable person whose life was cut short,” Choi said. She expressed hope that reporters will take lessons from covering COVID-19 to other public health crises, and remember to tell the stories behind each data point.

    Visualizing COVID-19 data

    Júlia Ledur talked through several of her COVID-19 visualizations projects, including both graphics at the COVID Tracking Project and the Washington Post. She shared how her approach to visualizing these data has changed as the data have become increasingly complicated, now that we’re contending with many test types, vaccinations, and more.

    Early COVID Tracking Project data, collected and visualized in March 2020.

    A few examples:

    • An early graphic (above) from March 2020, calling attention to data gaps in the CDC’s reporting. The agency had reported far fewer COVID-19 tests than the COVID Tracking Project had compiled from states at that time.
    • A scrollytelling explainer that calls attention to just how complicated test positivity can get, by showing how this metric gets calculated.
    • Another test positivity post, this one focusing on inconsistencies between the state. This post uses a fictional character called Bob to show how Bob’s testing experience might be counted in three different ways.
    • Graphics explaining test per positive, another metric which may be used to show complications in COVID-19 testing—though, as Ledur said, this metric did not end up taking off in the same way that test positivity did.
    • Trading cards and other graphics exploring different test types, to accompany an explainer on how the tests compare. Ledur worked with CTP Science Communication lead Jessica Malaty Rivera to ensure her graphics were accurately representing the science of each test while keeping readers engaged.
    • Two Washington Post stories on Manaus, a city in the Amazon Rainforest that faced major supply challenges during a COVID-19 surge driven by a new variant.

    Ledur also shared a few of her best practices for visualizing COVID-19 data:

    • Be clear about what you’re showing and what you’re not showing. Include clear labels, methodology, and specific notes on what you aren’t including.
    • Don’t avoid uncertainty. Instead, highlight it. Tell your audience what’s going on.
    • Add notes and/or annotations with caveats explaining data anomalies.
    • Put the data in context. Show how one point in time compares to others, or how one place compares to others.
    • Check your science. Work with those who have the right expertise to make sure your work is correctly reflecting the issue.
    • Have empathy. Put yourself in the shoes of your reader; think about what questions you might have, how you can answer them, and where you can’t answer them.

    “You have an idea of what it’s like to be affected by this, because you certainly are,” Ledur said. Leaning into this experience as you communicate data, remembering how the numbers make you feel, will make your reporting better.

    Addressing uncertainty

    “This pandemic has been a time when the process of science has been on public display,” Christie Aschwanden said to start her talk. She called this period an incredible, but overwhelming opportunity to convey the process of science in an accurate way.

    And forefront in the process of science is the idea of uncertainty. We might think numbers are neutral, Aschwanden said—“we measured this and therefore it’s true”—but in fact, they are subject to biases which must be communicated along with the numbers.

    These biases are why political statements such as “we follow the science” may be harmful. “Science is a method of understanding, it’s not an answer,” Aschwanden said; we should prepare people to see how a number or a conclusion may change, so that when it inevitably does change, they don’t feel betrayed.

    One high-profile example of this trend is mask-wearing. Public health experts such as Dr. Anthony Fauci said in early spring 2020 that masks were not necessary for the general public. This statement reflected scientific knowledge at the time—but when this knowledge changed as the virus’ tendency to spread through air became better understood, many Americans held up the change as evidence of a conspiracy.

    This trend also occurs with medical treatments. Convalescent plasma, one COVID-19 treatment, held promise early on but has not seen benefits in more recent randomized control trials. It’s important to communicate the uncertainties and shortcomings of provisional studies, so that people don’t put all their hopes into a treatment that later is shown to be less viable.

    “We need to get from this magic wand idea about science,” Aschwanden said. Science is not a singular path towards truth; it’s a “process of uncertainty reduction.” Pointing out and discussing the uncertainty, whether that’s through annotations on a chart or caveats in a news story, will help readers understand how decisions may change based on new evidence.

    Here are Aschwanden’s tips for discussing scientific data with the public:

    • Emphasize that uncertainty is okay (a feature, not a bug)
    • Explain that knowledge is provisional
    • Promote openness to new evidence, even as we become more certain
    • Beware motivated reasoning and cognitive biases
    • Convey that data aren’t neutral

    One article that effectively follows these guidelines is a story Aschwanden wrote in the fall for Scientific American. She walks readers through the data on COVID-19 deaths in order to debunk the rumor that death counts are inflated by doctors. The story uses three lines of evidence to explain that there’s a lot of uncertainty about the precise number of COVID-19 deaths, but that there’s not much uncertainty about the magnitude of these deaths.
    Aschwanden encourages any reporters working on COVID-19 coverage to join a free listserv run by the National Association of Science Writers, which she moderates. To join the list, send an email to nasw-COVID19mods@nasw.org with your name, email, and publication(s) that you are covering COVID-19 for, either on staff or freelance.

    More workshop recaps

    • Goodnight, COVID Tracking Project

      Goodnight, COVID Tracking Project

      The COVID Tracking Project’s homepage on March 7, 2021.

      A couple of hours after I send today’s newsletter, I will do my final shift of data entry work on the COVID Tracking Project’s Testing and Outcomes dataset. Then, later in the evening, I will do my final shift on the COVID Racial Data Tracker. And then I will probably spend another hour or two bothering my fellow volunteers on Slack because I don’t want it to be over quite yet.

      In case you aren’t fully embroiled in the COVID-19 data world, here’s some context. Last spring, a few journalists and other data-watchers realized that the U.S.’s national public health agencies weren’t doing a very good job of reporting COVID-19 tests. Alexis Madrigal and Rob Meyer (of The Atlantic) compiled their own count from state public health agencies. Jeff Hammerbacher (of Related Sciences) had independently compiled his own count, also from state agencies. And, as the About page on the website goes: “The two efforts came together March 7 and made a call for volunteers, our managing editor, Erin Kissane joined up, and the COVID Tracking Project was born.”

      Now, one year after that formal beginning of the Project’s test-counting efforts, the team is ending data collection work. Erin Kissane and Alexis Madrigal provided some background for that decision in a blog published on February 1. I recommend reading the piece in full, if you haven’t yet, but the TL;DR is that a. this data collection work should be done by federal public health agencies, not a motley group of researchers and volunteers, and b. the federal agencies have greatly improved their own data collection and reporting efforts in recent months.

      The Project’s core Testing and Outcomes dataset formally ceases updates today, along with the Racial Data Tracker and Long-Term Care Data Tracker. But the Project has provided a lot of documentation and guidance for data users who want to keep tracking the pandemic, along with analysis that will be useful for months (if not years) to come. The rest of this post shares the highlights from those resources, along with a few personal reflections.

      Where to find your COVID-19 data now

      So, you’re a journalist who’s relied on the COVID Tracking Project’s tweets to illuminate pandemic trends for the past year. Or you’re a researcher who’s linked the Project’s API to your own tracking dashboard. Or you’re a concerned reader who’s checked up on your state regularly, watching the time series charts and annotations. Where do you go for your data now?

      Through a series of analysis posts and webinars over the past few weeks, Project staff have made their recommendation clear: go to the federal government. In recent months, the CDC and the HHS have built up data collection practices and public dashboards that make these data easier to work with.

      Here are a few highlights:

      • For daily updates at all geographic levels, use the Community Profile Reports. After months of private updates sent from the White House COVID-19 task force to governors, the data behind these in-depth reports were made public in December. The PDF reports themselves were made public in January, after Biden took office. The reports include detailed data on cases, deaths, tests, and hospitalizations for states, counties, and metropolitan areas. I’ve written more about the reports here.
      • For weekly updates, use the COVID Data Tracker Weekly Review. As I mentioned in a National Numbers post two weeks ago: the CDC is doing weekly updates now! These updates include reports on the national trends for cases, deaths, hospitalizations, vaccinations, and SARS-CoV-2 variants. They may be drier than CTP blog posts, but they’re full of data. You can also sign up to receive the updates as a newsletter, sent every Friday afternoon—the CDC has really moved into the 21st-century media landscape.
      • For state-specific updates, use the State Profile Reports. Similarly to the Community Profile Reports, these documents provide many major state-level metrics in one place, along with local data and color-coding to show areas of concern. They’re released weekly, and can be downloaded either state-by-state or in one massive federal doc.
      • For case and deaths data, use the CDC’s state-by-state dataset. This dataset compiles figures reported by states, territories, and other jurisdictions. It matches up pretty closely to CTP’s data, though there are some differences due to definitions that don’t match and other discrepancies; here’s an analysis post on cases, and here’s a post on deaths. You can also see these data in the CDC’s COVID Data Tracker and reports.
      • For testing data, use the HHS PCR testing time series. This dataset includes results of PCR tests from over 1,000 labs, hospitals, and other testing locations. Unlike CTP, the federal government can mandate how states report their tests, so this dataset is standardized in a way that the Project’s couldn’t be. Kara Schechtman has written more about where federal testing data come from and how to use them here. The HHS isn’t (yet) publishing comprehensive data on antibody or antigen tests, as these test types are even more difficult to standardize.
      • For hospitalization data, use the HHS hospitalization dataset. I’ve reported extensively on this dataset, as has CTP. After a rocky start in the summer, the HHS has shown that it can compile a lot of data points from a lot of hospitals, get them standardized, and make them public. HHS data for current hospitalizations are “usually within a few percentage points” of corresponding data reported by states themselves, says a recent CTP post on the subject. Find the state-level time series here and the facility-level dataset here.
      • For long-term care data, use the CMS nursing home dataset. The Centers for Medicare & Medicaid Services are responsible for overseeing all federally-funded nursing homes. Since last spring, this responsibility has included tracking COVID-19 in those nursing homes—including cases and deaths among residents and staff, along with equipment, testing availability, and other information. The CMS dataset accounts for fewer overall cases than CTP’s long-term care dataset because nursing homes only account for one type of long-term care facility. But, like any federal dataset, it’s more standardized and more detailed. Here’s an analysis post with more info.
      • For race and ethnicity data, there are a couple of options. The CDC’s COVID Data Tracker includes national figures on total cases and deaths by race and ethnicity—at least, for the 52% of cases and 74% of cases where demographic information is available. More detailed information (such as state-by-state data) is available on deaths by race and ethnicity via the CDC’s National Center for Health Statistics. A blog post with more information on substitutes for the COVID Racial Data Tracker is forthcoming.

      The COVID Tracking Project’s federal data webinars concluded this past Thursday with a session on race and ethnicity and long-term care facilities. Slides and recordings from these sessions haven’t been publicly posted yet, but you can look out for them on the Project’s website.
      Also, for the more technical data nerds among you: COVID Act Now has written up a Covid Tracking Migration Guide for users of the CTP API, and the Johns Hopkins Coronavirus Resource Center announced that it will begin providing state testing data.

      Analysis and update posts to re-read

      It took a lot of self control for me to not just link every single CTP article in here. But I’ll give you just a few of my favorites, listed in no particular order.

      What the COVID Tracking Project gave me

      I joined the COVID Tracking Project as a volunteer in early April, 2020. I actually searched back through my calendar to find exactly when I did a data entry training—it was Thursday, April 2.

      At the time, I wanted to better understand the numbers I kept seeing, in tweets and news stories and Cuomo’s powerpoints. But more than that, I wanted to do something. I sat, cooped up in my little Brooklyn apartment, listening to the endless sirens screaming by. I ran to the park and wanted to yell at every person I saw walking without a mask. I donated to mutual aid funds, but even that felt empty, almost impersonal.

      The Project put out a call for volunteers, and I thought, okay, data entry. I can do data entry. I can do spreadsheets. I know spreadsheets.

      Well, I know spreadsheets much better now, almost a year later. I know how to navigate through a state dashboard, find all its data definitions, and puzzle through its update time. But beyond all the technical stuff, volunteering for CTP gave me a sense of purpose and community. No matter how tired or angry the world made me feel, I knew that, for a few hours a week, I’d be contributing to something bigger than myself. My work played a small part in making data accessible, bringing information to a wider audience.

      Much ink has been spilled about how mutual aid groups have helped neighbors find each other, especially during that period of spring 2020 when everything seemed so bleak. I have seen the Project as another form of mutual aid. I’ve given countless hours to CTP over the past year in the form of data entry shifts, analysis, writing, and custom emojis—but those hours have also been given back to me, in everything from Tableau tricks to playlist recommendations. My fellow volunteers, the vast majority of whom I’ve never met in person, are my neighbors. We live in the same spreadsheets and Slack channels; we see the world in the same way. 

      I am beginning to understand how journalism, or something like journalism, can work when it is led by a community. By community, I mean: a group of people united in one mission. And by mission, I mean: bringing information to the public. Accessibility and accountability are common buzzwords right now, I think, but CTP approaches the truth of these principles, whether it’s by doing shifts through Christmas or by writing out detailed process notes on how to navigate Wyoming’s dashboard(s).

      I know why the Project’s data collection efforts are ending. The federal government can compile—and is compiling—data on a far more detailed and standardized level than a group of researchers and volunteers ever could. But I am grateful to have been part of this beautiful thing, so much bigger than myself. It is the bar by which I will measure every organization I join from here on out.

      If you’ve ever read the About page on the COVID-19 Data Dispatch website, you may have noticed a disclaimer stating that, while I volunteer for CTP, this publication is an entirely separate project that reflects my own reporting and explanations. This is true; I’m careful to keep this project distinct. But of course, the COVID-19 Data Dispatch has been influenced by what I’ve learned volunteering for CTP. I have attempted to carry forward those values, accessibility and accountability. I’ll keep carrying them forward. Feedback is always welcome.

      To all my neighbors in the CTP Slack: thank you. And to everyone who has followed the data: there is work still to be done.

      More federal data posts

      • National numbers, March 7

        National numbers, March 7

        In the past week (February 28 through March 6), the U.S. reported about 417,000 new cases, according to the COVID Tracking Project. This amounts to:

        • An average of 60,000 new cases each day—comparable to the seven-day average for daily cases in early August
        • 127 total new cases for every 100,000 Americans
        • 1 in 786 Americans getting diagnosed with COVID-19 in the past week
        Nationwide COVID-19 metrics published in the COVID Tracking Project’s daily update on March 6. This will be the final week we use Project data for these updates.

        Last week, America also saw:

        • 41,400 people now hospitalized with COVID-19 (13 for every 100,000 people)
        • 12,100 new COVID-19 deaths (3.7 for every 100,000 people)
        • An average of 2.2 million vaccinations per day (per Bloomberg)

        The U.S. recorded fewer new daily cases this week than last week, finally dropping to a level lower than the summer surge. We saw fewer hospitalized COVID-19 patients and deaths from the disease this week as well. But the possibility of a plateau—or even a variant-driven fourth surge—is worrying some experts. CDC Director Dr. Rochelle Walensky has cited this concern in press briefings over the past week, encouraging that Americans “double down on prevention measures.”

        Dr. Walensky’s assertion is backed up by a new CDC report that links mask mandates and dining restrictions to reduced community spread. (We knew this already, of course, but it’s always nice to have a CDC report you can cite.)

        Variants, meanwhile, continue to spread. We’re up to 2,600 reported B.1.1.7 cases, though this and other variant counts are likely significantly underreported. Nature’s Ewen Callaway calls attention to variant reporting issues in a recent story: despite national efforts to ramp up sequencing, the practice is still heavily decentralized in the U.S., with heavily-resourced states like New York and California sequencing thousands of genomes while other states collect far fewer. And “homegrown” variants of concern, such as the variant reportedly spreading through New York City, don’t even appear on the CDC’s dashboard yet.

        But vaccinations give us one place to be optimistic. More than two million Americans are now getting a dose each day, per Bloomberg, with the first Johnson & Johnson shots landing on the market this week. After the announcement of a cross-pharma partnership (Merck giving J&J a manufacturing boost), President Biden said that the U.S. will have enough COVID-19 vaccine doses for every adult by the end of May. How quickly—and how equitably—those doses get administered will be another battle. 

        Finally, a sad acknowledgment: with the COVID Tracking Project concluding data collection today, I will be switching my source for these updates starting next week. I plan to use CDC and HHS data, relying heavily on the CDC’s new COVID Data Tracker Weekly Reviews. More on filling the CTP-shaped hole in your data in the next section.

      • COVID source shout-out: Oklahoma

        As of yesterday, 42 states and D.C. are reporting vaccinations by race and ethnicity. You can see the CDD’s full annotations of state vaccination data here.

        One of those 42 states is Oklahoma. Oklahoma wasn’t listed as reporting any demographic data in our annotations until yesterday—but in fact, this state has been reporting vaccinations by race, ethnicity, age, and gender since January. I missed this information in previous weeks because the state has been reporting these data in its Weekly Epidemiology and Surveillance Reports, rather than on its main COVID-19 dashboard where the totals are reported.

        So, this week, the COVID source shout-out section is also a public apology to the good state of Oklahoma. I’m sorry I missed your vaccination demographics. You’re doing great.

      • Featured sources and federal data updates, Feb. 28

        We’re sneaking a few more federal updates into the source section this week.

        • CDC changed their methodology for state vaccination counts: Last Saturday, February 20, the CDC made two major changes to how it reports vaccination data. First, instead of simply reporting doses administered by federal agencies (the Department of Defense, Indian Health Services, etc.) as fully separate counts, the agency started reporting these doses in the states where they were administered. Second, the CDC started reporting vaccinations in the state where someone is counted as a resident, rather than where they received the shot. Both of these changes make state-reported counts and CDC-reported counts less directly comparable, since states typically don’t track federal agency doses and count doses based on where they were administered. You can read more about these changes on Bloomberg’s vaccine tracker methodology and analysis blog; Bloomberg is now using CDC data only to update its U.S. data.
        • VaccineFinder is open for COVID-19 vaccines: As of Wednesday, Americans can use this national tool to find COVID-19 vaccine appointments. Just put in your ZIP code and select a radius (1 mile, 10 miles, 50 miles, etc.), and the tool will show you providers nearby. For each provider, the tool provides contact information—and, crucially, whether this location actually has vaccines in stock. Unlike many other federal dashboards, VaccineFinder isn’t a new tool; it was developed during the H1N1 flu outbreak in 2009. STAT’s Katie Palmer provides more history and context on the site here.
        • Government Accountability Office may push for more data centralization: The Government Accountability Office (or GAO), a watchdog agency that does auditing and evaluations for Congress, has been investigating the federal government’s COVID-19 data collection—and is finding this collection “inconsistent and confusing,” according to a report by POLITICO’s Erin Banco. While the GAO’s report won’t be finalized and made public until March, the agency is expected to recommend that data should be more standardized. It could call for the CDC to make changes to its data collection on cases, deaths, and vaccines similar to how the HHS revamped collection for testing and hospitalization data in summer 2020. CDC officials are wary of these potential changes; it’ll definitely be a big data story to follow this spring.
        • Global.health is ready for research: Back in January, I wrote about Global.health, a data science initiative aiming to bring anonymized case data to researchers on a global scale. The initiative’s COVID-19 dataset is now online, including over 10 million individual case records from dozens of countries. 10 million case records! Including demographic and outcomes data! If you’d like to better understand why this dataset is a pretty big deal, read this article in Nature or this one in STAT. I plan on digging into the dataset next week, and may devote more space to it in a future issue.
        • NIH COVID-19 treatment guidelines: In one of the White House COVID-19 press briefings this week, Dr. Fauci referenced this National Institutes of Health (NIH) website intended to provide both physicians and researchers with the latest guidance on how to treat COVID-19 patients. The website acts as a living medical document, featuring an interactive table of contents and a text search tool. Follow @NIHCOVIDTxGuide on Twitter for updates.
        • Burbio’s K-12 School Opening Tracker: Burbio, a digital platform for community events, is actively monitoring over 1,200 school districts to determine which schools are currently using virtual, in-person, and hybrid models. The sample size includes the 200 largest districts in the U.S. and other districts with a mix of sizes and geographies, in order to reflect local decision-making across the U.S. See more methodology details here.
        • COVID-19’s impact on LGBTQ+ communities: The Journalist’s Resource at Harvard Kennedy School has compiled a list of recent research on how the coronavirus pandemic impacted LGBTQ+ Americans. In many cases, the pandemic furthered disproportionate poverty and poor health outcomes in this community; they shouldn’t be ignored in COVID-19 coverage.
        • The Accountability Project: A repository of public data run by the Investigative Reporting Workshop, the Accountability Project reached 1 billion records last week. The Project includes several COVID-19-related datasets, including a dataset of Paycheck Protection Program loans and data on hospitals and nursing homes.