I couldn’t decide which of these news items to focus on for a short post this week, so I wrote blurbs for all five. This title and format are inspired by Rob Meyer’s Weekly Planet newsletter.
HHS added vaccinations to its facility-level hospitalization dataset: Last week, I discussed the HHS’s addition of COVID-19 patient admissions by age to its state-level hospitalization dataset. This week, the HHS followed that up with new fields in its facility-level dataset, reflecting vaccinations among hospital staff and patients. You can find the dataset here and read more about the new fields in the FAQ here (starting on page 14). It’s crucial to note that these are optional fields, meaning hospitals can submit their other COVID-19 numbers without any vaccination reporting. Only about 3,200 of the total 5,000 facilities in the HHS dataset have opted in—so don’t sum these numbers to draw conclusions about your state or county. Still, this is the most detailed occupational data I’ve seen for the U.S. thus far.
A new IHME analysis suggests the global COVID-19 death toll may be double reported counts: 3.3 million people have died from COVID-19 worldwide as of May 8, according to the World Health Organization. But a new modeling study from the University of Washington’s Institute for Health Metrics and Evaluation (IHME) suggests that the actual death number is 6.9 million. Under-testing and overburdened healthcare systems may contribute to reporting systems missing COVID-19 deaths, though the reasons—and the undercount’s magnitude—are different in each country. In the U.S., IHME estimates about 900,000 deaths, while the CDC counts 562,000. Read STAT’s Helen Branswell for more context on this study.
The NYT published a dangerous misrepresentation of vaccine hesitancy (then quietly corrected it): A New York Times story on herd immunity garnered a lot of attention (and Twitter debate) earlier this week. One specific aspect of the story stuck out to some COVID-19 data experts, though: a U.S. map entitled, “Uneven Willingness to Get Vaccinated Could Affect Herd Immunity.” The map, based on HHS estimates, claims to display vaccine confidence at the county level. But the estimates are really more reflective of state averages, and moreover, the NYT originally double-counted the people who are strongly opposed to vaccines, leading to a map that made the U.S. look much more hesitant than it actually is. Biologist Carl Bergstrom has a thread detailing the issue, including original and corrected versions of the map.
We still need better demographic data: A poignant article in The Atlantic from Ibram Kendi calls attention to gaps in COVID-19 data collection that continue to loom large, more than a year into the pandemic. The story primarily discusses race and ethnicity data, citing the COVID Racial Data Tracker (which I worked on), but Kendi also highlights other underreported populations. For example: “The only available COVID-19 data on undocumented immigrants come from Immigration and Customs Enforcement detention centers.”
NIH college student trial is having a hard time recruiting: If you, like me, have been curious about how that big NIH trial to study vaccine effectiveness in college students has progressed since it was announced last March, I recommend this story from U.S. News reporter Chelsea Cirruzzo. The study aimed to recruit 12,000 students at a select number of colleges, but because the vaccine rollout has progressed faster than expected, researchers are having a hard time finding not-yet-vaccinated students to enroll. (1,000 are enrolled so far.) Now, students at all higher ed institutions can join.
Since mid-December, the Department of Health and Human Services has published a dataset on how the pandemic is impacting individual hospitals across the country. (You can read the CDD’s detailed description of that dataset here.) One of the most useful—and, in my opinion, most under-utilized—aspects of this facility dataset is that it provides COVID-19 hospital admissions broken out by age, allowing data users to discern which age groups are getting hardest hit by severe COVID-19 cases in different parts of the country.
This week, the HHS made it much easier to do that analysis. The agency added hospital admissions by age to its state-level hospitalization dataset. Now, if you want to see a patient breakdown for your state, you can simply look at the state-level info already compiled by HHS data experts, rather than summing up numbers from the facility-level info yourself.
Besides that convenience factor, there are two big advantages of the state-level info:
The state-level dataset is updated daily, while the facility-level dataset is updated weekly. More frequent data updates allow for more specific time series analysis.
Low patient numbers aren’t suppressed. In the facility-level dataset, patient numbers between 1 and 4 are suppressed with an error value (-999999) to protect patient privacy. In the age data, this happens at a lot of facilities, so it’s impossible for an outside data user to calculate accurate totals for a given city, county, or state. On the other hand, with HHS experts doing the aggregation in the state-level dataset, no values need to be obscured—basically, these state-level figures are much more accurate.
The age groups in the state-level dataset match those available in the facility-level dataset: pediatric COVID-19 patients, patients age 18-19, patients in ten-year age ranges from 20 to 79, and patients age 80 or older. HHS also splits the patient counts into those who have confirmed COVID-19 cases (meaning their diagnosis is verified by a PCR test) and those who have suspected cases (meaning the patients have COVID-19 symptoms or a positive result on a non-PCR test.)
Also, Conor Kelly, COVID Tracking Project volunteer and COVID-19 visualizer extraordinaire, has added these new data to his COVID-19 Tableau dashboard. (See “Hosp. Admissions Over Time,” then “Admissions by Age.”) Highly recommend checking out that dashboard and exploring the trends for your state.
(Finally, it is possible I’m a little annoyed that the HHS made this lovely update immediately after I turned in an assignment in which I did this analysis the long way, with the facility-level dataset. Look out for that story early next week.)
The COVID Tracking Project’s homepage on March 7, 2021.
A couple of hours after I send today’s newsletter, I will do my final shift of data entry work on the COVID Tracking Project’s Testing and Outcomes dataset. Then, later in the evening, I will do my final shift on the COVID Racial Data Tracker. And then I will probably spend another hour or two bothering my fellow volunteers on Slack because I don’t want it to be over quite yet.
In case you aren’t fully embroiled in the COVID-19 data world, here’s some context. Last spring, a few journalists and other data-watchers realized that the U.S.’s national public health agencies weren’t doing a very good job of reporting COVID-19 tests. Alexis Madrigal and Rob Meyer (of The Atlantic) compiled their own count from state public health agencies. Jeff Hammerbacher (of Related Sciences) had independently compiled his own count, also from state agencies. And, as the About page on the website goes: “The two efforts came together March 7 and made a call for volunteers, our managing editor, Erin Kissane joined up, and the COVID Tracking Project was born.”
Now, one year after that formal beginning of the Project’s test-counting efforts, the team is ending data collection work. Erin Kissane and Alexis Madrigal provided some background for that decision in a blog published on February 1. I recommend reading the piece in full, if you haven’t yet, but the TL;DR is that a. this data collection work should be done by federal public health agencies, not a motley group of researchers and volunteers, and b. the federal agencies have greatly improved their own data collection and reporting efforts in recent months.
The Project’s core Testing and Outcomes dataset formally ceases updates today, along with the Racial Data Tracker and Long-Term Care Data Tracker. But the Project has provided a lot of documentation and guidance for data users who want to keep tracking the pandemic, along with analysis that will be useful for months (if not years) to come. The rest of this post shares the highlights from those resources, along with a few personal reflections.
Where to find your COVID-19 data now
So, you’re a journalist who’s relied on the COVID Tracking Project’s tweets to illuminate pandemic trends for the past year. Or you’re a researcher who’s linked the Project’s API to your own tracking dashboard. Or you’re a concerned reader who’s checked up on your state regularly, watching the time series charts and annotations. Where do you go for your data now?
Through a series of analysis posts and webinars over the past few weeks, Project staff have made their recommendation clear: go to the federal government. In recent months, the CDC and the HHS have built up data collection practices and public dashboards that make these data easier to work with.
Here are a few highlights:
For daily updates at all geographic levels, use the Community Profile Reports. After months of private updates sent from the White House COVID-19 task force to governors, the data behind these in-depth reports were made public in December. The PDF reports themselves were made public in January, after Biden took office. The reports include detailed data on cases, deaths, tests, and hospitalizations for states, counties, and metropolitan areas. I’ve written more about the reports here.
For weekly updates, use the COVID Data Tracker Weekly Review. As I mentioned in a National Numbers post two weeks ago: the CDC is doing weekly updates now! These updates include reports on the national trends for cases, deaths, hospitalizations, vaccinations, and SARS-CoV-2 variants. They may be drier than CTP blog posts, but they’re full of data. You can also sign up to receive the updates as a newsletter, sent every Friday afternoon—the CDC has really moved into the 21st-century media landscape.
For state-specific updates, use the State Profile Reports. Similarly to the Community Profile Reports, these documents provide many major state-level metrics in one place, along with local data and color-coding to show areas of concern. They’re released weekly, and can be downloaded either state-by-state or in one massive federal doc.
For case and deaths data, use the CDC’s state-by-state dataset. This dataset compiles figures reported by states, territories, and other jurisdictions. It matches up pretty closely to CTP’s data, though there are some differences due to definitions that don’t match and other discrepancies; here’s an analysis post on cases, and here’s a post on deaths. You can also see these data in the CDC’s COVID Data Tracker and reports.
For testing data, use the HHS PCR testing time series. This dataset includes results of PCR tests from over 1,000 labs, hospitals, and other testing locations. Unlike CTP, the federal government can mandate how states report their tests, so this dataset is standardized in a way that the Project’s couldn’t be. Kara Schechtman has written more about where federal testing data come from and how to use them here. The HHS isn’t (yet) publishing comprehensive data on antibody or antigen tests, as these test types are even more difficult to standardize.
For hospitalization data, use the HHS hospitalization dataset. I’ve reported extensively on this dataset, as has CTP. After a rocky start in the summer, the HHS has shown that it can compile a lot of data points from a lot of hospitals, get them standardized, and make them public. HHS data for current hospitalizations are “usually within a few percentage points” of corresponding data reported by states themselves, says a recent CTP post on the subject. Find the state-level time series here and the facility-level dataset here.
For long-term care data, use the CMS nursing home dataset. The Centers for Medicare & Medicaid Services are responsible for overseeing all federally-funded nursing homes. Since last spring, this responsibility has included tracking COVID-19 in those nursing homes—including cases and deaths among residents and staff, along with equipment, testing availability, and other information. The CMS dataset accounts for fewer overall cases than CTP’s long-term care dataset because nursing homes only account for one type of long-term care facility. But, like any federal dataset, it’s more standardized and more detailed. Here’s an analysis post with more info.
For race and ethnicity data, there are a couple of options. The CDC’s COVID Data Tracker includes national figures on total cases and deaths by race and ethnicity—at least, for the 52% of cases and 74% of cases where demographic information is available. More detailed information (such as state-by-state data) is available on deaths by race and ethnicity via the CDC’s National Center for Health Statistics. A blog post with more information on substitutes for the COVID Racial Data Tracker is forthcoming.
The COVID Tracking Project’s federal data webinars concluded this past Thursday with a session on race and ethnicity and long-term care facilities. Slides and recordings from these sessions haven’t been publicly posted yet, but you can look out for them on the Project’s website. Also, for the more technical data nerds among you: COVID Act Now has written up a Covid Tracking Migration Guide for users of the CTP API, and the Johns Hopkins Coronavirus Resource Center announced that it will begin providing state testing data.
Analysis and update posts to re-read
It took a lot of self control for me to not just link every single CTP article in here. But I’ll give you just a few of my favorites, listed in no particular order.
Test Positivity in the US Is a Mess (Oct. 8): This piece was one of my “20 best COVID-19 data stories of 2020.” To repeat what I wrote then: “Out of the many informative blog posts produced by the COVID Tracking Project since last spring, this is the one I’ve shared most widely. Project Lead Erin Kissane and Science Communication Lead Jessica Malaty Rivera clearly explain how COVID-19 test positivity—what should be a simple metric, the share of tests conducted in a given region that return a positive result—can be calculated in several different ways.”
Daily COVID-19 Data Is About to Get Weird (Nov. 24): I probably linked to this piece in newsletters once a week from the end of November until early January. Erin Kissane clearly explains the patterns around holiday data and how to interpret them—an explanation that continues to prove useful, as we saw a kind-of holiday reporting pattern take place after winter storms hit the South.
The “Good” Metric Is Pretty Bad: Why It’s Hard to Count the People Who Have Recovered from COVID-19 (Jan. 13): The “recovered” number that many states publish is even more inconsistent than other metrics. Some states use hospital discharges to calculate this number, while others simply re-categorize cases as “recovered” if a worse outcome hasn’t occurred in a designated timeframe. All definitions exclude long haulers, who face symptoms months after getting sick.
Counting COVID-19 Tests: How States Do It, How We Do It, and What’s Changing (Aug. 13): You know how the federal testing dataset is more standardized than what CTP can do? This is because there are three (3!) different units for COVID-19 test reporting, and every state does something slightly different. Kara Schechtman gets into all of the weeds here.
What We’ve Learned About the HHS’s Hospitalization Data (Dec. 4): I love this post because it says, in no uncertain terms, look, we did all this analysis, and we’re telling you, you can trust the HHS hospitalization data now. It’s an early indication of CTP recognizing increased transparency from the federal agency, weeks before the facility-level dataset would be released.
Florida’s COVID-19 Data: What We Know, What’s Wrong, and What’s Missing (July 8): This one is essentially a feature-length, detailed, clearly written callout post for the state of Florida. Many of the issues described in this post have now been resolved, but one of the most important developments occurred just days after it was published: Florida started publishing a count of current COVID-19 patients.
The State of State Antigen Test Reporting (Oct. 19): I co-wrote this one with Quang Nguyen. The post explains what we know about antigen tests, those useful-for-surveillance-but-pesky-for-data rapid tests, and calls attention to potential underreporting.
I joined the COVID Tracking Project as a volunteer in early April, 2020. I actually searched back through my calendar to find exactly when I did a data entry training—it was Thursday, April 2.
At the time, I wanted to better understand the numbers I kept seeing, in tweets and news stories and Cuomo’s powerpoints. But more than that, I wanted to do something. I sat, cooped up in my little Brooklyn apartment, listening to the endless sirens screaming by. I ran to the park and wanted to yell at every person I saw walking without a mask. I donated to mutual aid funds, but even that felt empty, almost impersonal.
The Project put out a call for volunteers, and I thought, okay, data entry. I can do data entry. I can do spreadsheets. I know spreadsheets.
Well, I know spreadsheets much better now, almost a year later. I know how to navigate through a state dashboard, find all its data definitions, and puzzle through its update time. But beyond all the technical stuff, volunteering for CTP gave me a sense of purpose and community. No matter how tired or angry the world made me feel, I knew that, for a few hours a week, I’d be contributing to something bigger than myself. My work played a small part in making data accessible, bringing information to a wider audience.
Much ink has been spilled about how mutual aid groups have helped neighbors find each other, especially during that period of spring 2020 when everything seemed so bleak. I have seen the Project as another form of mutual aid. I’ve given countless hours to CTP over the past year in the form of data entry shifts, analysis, writing, and custom emojis—but those hours have also been given back to me, in everything from Tableau tricks to playlist recommendations. My fellow volunteers, the vast majority of whom I’ve never met in person, are my neighbors. We live in the same spreadsheets and Slack channels; we see the world in the same way.
I am beginning to understand how journalism, or something like journalism, can work when it is led by a community. By community, I mean: a group of people united in one mission. And by mission, I mean: bringing information to the public. Accessibility and accountability are common buzzwords right now, I think, but CTP approaches the truth of these principles, whether it’s by doing shifts through Christmas or by writing out detailed process notes on how to navigate Wyoming’s dashboard(s).
I know why the Project’s data collection efforts are ending. The federal government can compile—and is compiling—data on a far more detailed and standardized level than a group of researchers and volunteers ever could. But I am grateful to have been part of this beautiful thing, so much bigger than myself. It is the bar by which I will measure every organization I join from here on out.
If you’ve ever read the About page on the COVID-19 Data Dispatch website, you may have noticed a disclaimer stating that, while I volunteer for CTP, this publication is an entirely separate project that reflects my own reporting and explanations. This is true; I’m careful to keep this project distinct. But of course, the COVID-19 Data Dispatch has been influenced by what I’ve learned volunteering for CTP. I have attempted to carry forward those values, accessibility and accountability. I’ll keep carrying them forward. Feedback is always welcome.
To all my neighbors in the CTP Slack: thank you. And to everyone who has followed the data: there is work still to be done.
Hospital facilities visualization by the COVID Tracking Project: Last month, the Department of Health and Human Services (HHS) released an extensive dataset showing how COVID-19 patients are impacting hospitals at the individual facility level. (See my Dec. 13 post for more information on this dataset.) The COVID Tracking Project has produced an interactive visualization from this dataset, allowing users to zoom in to individual facilities or search for hospitals in a particular city or ZIP code. I contributed some copy to this page.
Therapeutics distribution (from HHS): The HHS is posting a list of locations that have received monoclonal antibody therapies, for the purpose of treating COVID-19. Bamlanivimab, one such therapy, received EUA from the FDA in early November. The HHS page notes that this is not a complete list: “Although monoclonal antibody therapeutic treatments have been shipped nationwide, shipment locations are displayed for those States that have opted to have their locations displayed on this public website.”
Hospital discharge summaries (from the Healthcare Cost and Utilization Project): This project, under the HHS umbrella, posts time series data on U.S. hospital patients. The site recently posted summaries on patients from April to June 2020, including datasets specific to COVID-19, flu, and other viral respiratory infections. As epidemiologist Jason Salemi explains in a summary Twitter thread, the data doesn’t provide new information but may be useful for a researcher looking to dig into spring and summer hospitalization trends.
In the interest of giving credit to the HHS where credit is due: the agency updated its new facility-level hospitalization dataset right on schedule this past Monday.
This dataset allows Americans to see exactly how COVID-19 is impacting individual hospitals across the country. In last week’s issue, I explained why I was excited about this dataset and what researchers and reporters could do with it. (The highlights: hyperlocal data that can be aggregated to different geographies, a time series back to August, demographic information on COVID-19 patients, and HHS transparency.)
Last week, I used this hospitalization dataset—along with the HHS’s state-level hospitalization data—to build several visualizations showing how COVID-19 has hit hospitals at the individual, county, and state levels.
I also wrote a brief article on COVID-19 hospitalizations for Stacker, hosting visualizations and highlighting some major insights. The article was sent out to local journalists across the country via a News Direct press release. (If your outlet wants to repurpose Stacker’s article, get in touch with my coworker Mel at melanie@thestacker.com!)
A few national statistics:
Nearly 700 hospitals are at over 90% inpatient capacity, as of the most recent HHS data. 750 hospitals are at over 90% capacity in their ICUs.
The states with the highest rates of occupied beds are Maryland (79.8% of all beds occupied), Washington D.C. (80.0%), and Rhode Island (85.2%).
States with the highest shares of their populations hospitalized with COVID-19 are Arizona (53 patients per 100,000 population), Pennsylvania (55 per 100K), and Nevada (67 per 100K).
19% of hospitals in the nation are facing critical staffing shortages, while 24% anticipate such a shortage within the next week.
Staffing shortages are highest in Arkansas (33.6% of hospitals in the state), Wisconsin (35.6%), and North Dakota (42.0%).
Meanwhile, The Accountability Project has developed a datasette version of this hospitalization dataset. With a bit of code, you can query the data to access metrics for a specific hospital, city, county, or state. The Project has provided example queries to help you get started.
var divElement = document.getElementById(‘viz1608004219965’); var vizElement = divElement.getElementsByTagName(‘object’)[0]; if ( divElement.offsetWidth > 800 ) { vizElement.style.width=’100%’;vizElement.style.height=(divElement.offsetWidth*0.75)+’px’;} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width=’100%’;vizElement.style.height=(divElement.offsetWidth*0.75)+’px’;} else { vizElement.style.width=’100%’;vizElement.style.height=’650px’;} var scriptElement = document.createElement(‘script’); scriptElement.src = ‘https://public.tableau.com/javascripts/api/viz_v1.js’; vizElement.parentNode.insertBefore(scriptElement, vizElement);
When the Department of Health and Human Services (HHS) started reporting hospitalization data at the state level back in July, I wistfully told a friend that I wished the agency would report facility-level numbers. Another federal agency had recently started reporting this type of data for nursing homes, and I appreciated the flexibility and granularity with which I was able to analyze how the pandemic was impacting nursing home patients and staff. I wanted to see the pandemic’s impact on hospitals in the same way.
At the time, I considered this a pipe dream. The HHS was already facing major challenges: implementing a new data pipeline across the country, navigating bureaucratic issues with state public health departments, and working with individual hospitals to help them report more accurately and more often. Plus, transparency issues and political scandals plagued the agency. Making more data public seemed to be the least of its priorities.
But I’m happy to say that this week, my pipe dream came true. On Monday, the HHS published a new hospitalization dataset including capacity, new admissions, and other COVID-19-related numbers—for over 4,000 individual facilities across America.
This is, as I put it in a COVID Tracking Project blog post analyzing the dataset, a big deal. Project lead Alexis Madrigal called it “probably the single most important data release that we’ve seen from the Federal government.” I, in somewhat less professional terms, texted my girlfriend:
Please appreciate the level of self-control it took for me to not actually title this issue “HHS queen shit.”
Let me explain why this new dataset is so exciting—not just for a nerd like me, but for any American following the pandemic. I’m drawing on a COVID Tracking Project blog post unpacking the dataset, to which I contributed some explanatory copy.
Hyperlocal data: At a time when hospitals are overwhelmed across the nation, it is incredibly useful to see precisely which hospitals are the worst off and how COVID-19 is impacting them. Data scientists can pinpoint specific patterns and connections between regions. National aid groups can determine where to send PPE and other supplies. Journalists can see which hospitals should be the focus of local stories. The stories that can be told with this dataset are endless.
Aggregating to different geographies: The individual facility is the most detailed possible level of reporting for COVID-19 hospitalizations. But this HHS dataset also includes the state, county, and ZIP code for each hospital, along with unique codes that identify hospitals in the Medicare and Medicaid system. The data for specific facilities can thus be combined to make comparisons on a variety of geographic levels. I tried out a county-level visualization, for example; some counties are not represented, but you can still see a much more granular picture of hospital capacity than you would in a state-level map.
Time series back to August: HHS didn’t just provide data on how hospitals are coping with COVID-19 right now. They provided a full time series going back to the first week of August, with data starting shortly after the HHS began collecting information from hospitals. These historical data allow researchers to make more detailed comparisons between the nation’s last major COVID-19 peak and our current outbreak. There are some reporting errors from hospitals in the early weeks of the dataset; COVID Tracking Project analysis has shown that these errors become less significant in the week of August 28.
Includes coverage details: The dataset includes fields that can help researchers check the quality of an individual hospital’s reporting. These fields, called “coverage” numbers, show the number of days in a given week on which data were reported. A value of six for total_adult_patients_hospitalized_confirmed_and_suspected_covid_7_day_coverage, for example, indicates that this hospital reported how many adult COVID-19 patients it was treating on six of seven days in the past week. Many hospitals are now reporting all major metrics on six or seven days a week—HHS has really stepped up to encourage this level of reporting in recent months. For more information on hospital reporting coverage, see HHS Protect.
Admissions broken out by age: The HHS began reporting hospital COVID-19 admissions, or new COVID-19 patients entering the hospital, at the state level in November. The new dataset includes this information, at the facility level, for every week going back until August, and breaks out those new patients by age group. You can see exactly who is coming to the hospital with COVID-19 in age brackets of 18-19, ten-year ranges from 20 to 79, and 80+. Several other metrics in the dataset are also broken out by adult and children patients.
New fields: This dataset reports counts of emergency department visits, including both total visits for any reason and visits specifically related to COVID-19. (The HHS data dictionary defines this as “meets suspected or confirmed definition or presents for COVID diagnostic testing.”) These figures allow researchers to calculate the share of emergency department visits at a given hospital that are COVID-related, a new metric that wasn’t available from previous HHS reporting.
Signifies major effort from the HHS: When it comes to reporting hospitalization data, this agency has come a long way from the errors and transparency questions of the summer. Last week, the COVID Tracking Project published an analysis finding that HHS counts of COVID-19 patients are now in close proximity to similar counts reported by state public health departments—signifying that the federal data may be a useful, reliable complement to state data. (I discussed this analysis in last week’s issue.) The new facility-level dataset indicates that HHS data scientists understand the needs of COVID-19 researchers and communicators, and are working to make important data public. I will continue to carefully watch this agency, as will many of my fellow reporters. But I can’t deny that this data release was a major step for transparency and trust.
To get started with this dataset, you can zoom in to look at your community on this Tableau dashboard I made, visualizing the most recent week of data. (That most recent week of data reflects November 27 through December 3. As the dataset was first published last Monday, December 7, I’m anticipating an update tomorrow.)
Or, if you’d like to see more technical details on how to use the dataset, check out this community FAQ page created by data journalists and researchers at Careset Systems, the University of Minnesota, COVID Exit Strategy, and others.
Finally, for more exploration of the research possibilities I outlined above, you can read the COVID Tracking Project’s analysis. The post includes some pretty striking comparisons from summer outbreaks to now.
In July, the Department of Health and Human Services (HHS) took over collecting and reporting data on how COVID-19 is impacting America’s hospital systems. This takeover from the CDC—which had reported hospitalization data since the start of the pandemic—sparked a great deal of political and public health concern. Some healthcare experts worried that a technology switch would put undue burden on already-tired hospital workers, while others worried that the White House may influence the HHS’s data.
Since that data responsibility switch, I’ve spent a lot of time with that HHS dataset. In August, I wrote a blog post for the COVID Tracking Project which compared HHS’s counts of hospitalized COVID-19 patients to the Project’s counts (compiled from states). At the time, my co-author Rebecca Glassman and I observed discrepancies between the datasets, which we attributed in part to differences in definitions and reporting pipelines. For example: some states only report those hospital patients whose cases of COVID-19 have been confirmed with PCR tests, while HHS reports all patients (including those with confirmed and suspected cases).
I’ve covered the HHS hospitalization dataset several times in this newsletter since, including its investigation by journalists at ProPublica and Science Magazine and its expansion to include new metrics. The dataset has gone from a basic report of hospital capacity in every state to a comprehensive picture of how the pandemic is hitting hospitals. It includes breakdowns of patients with confirmed and suspected cases of COVID-19, patients in the intensive care unit (ICU), and patients who are adults and children. As of November, it also includes newly admitted patients and staffing shortages. At the same time, HHS officials have worked to resolve technical issues and get more hospitals reporting accurately in the system.
A new analysis, published this past Friday by the COVID Tracking Project, highlights how reliable the HHS dataset has become. The analysis compares HHS’s counts of hospitalized COVID-19 patients to the Project’s counts, compiled from states. Unlike the analysis I worked on in August, however, this recent work benefits from HHS’s expanded metrics and more thorough documentation from both the federal agency and states. If a state reports only confirmed cases, for example, this number can now be compared directly to the corresponding count of confirmed cases from the HHS.
Here’s how the two datasets line up, as of November 29:
The COVID Tracking Project and HHS counts of hospitalized patients closely match in September, October, and November.
Since November 8, in fact, the two datasets are within two percent of each other when adjusting for definitional differences.
The blog post also discusses how patient counts match in specific states. In 41 of 52 jurisdictions (including the District of Columbia and Puerto Rico), the two datasets are in close alignment. And even in the states where hospitalization numbers match less precisely, the two datasets generally follow the same trends. In other words: there may be differences in how the HHS and individual states are collecting and reporting their numbers, but both datasets tell the same story about how COVID-19 is impacting American hospitals.
I recommend giving the full blog post a read, if you’d like all the nerdy details. Alexis Madrigal also wrote a great summary thread on Twitter:
This new COVID Tracking project analysis comes several days after an investigation in Science Magazine called the HHS dataset into question. The investigation is based on a CDC comparison of these same two datasets which doesn’t account for the reporting differences I’ve discussed.
Charles Piller, the author of this story, raises important questions about HHS’s transparency and the burden that its system places on hospitals. It’s true that the implementation of HHS’s new data reporting system was rolled out quickly, faced technical challenges, and caused a great deal of confusion for national reporters and local hospital administrators alike. The HHS dataset deserves the careful scrutiny it has received.
But now that this careful scrutiny has been conducted—and the two datasets appear to tell the same story—I personally feel comfortable about using the HHS dataset in my reporting. In fact, I produced a Stacker story based on these data just last week: States with the highest COVID-19 hospitalization rates.
How many people in the U.S. are currently hospitalized with COVID-19? As of yesterday, 83,200.
This question calls attention to the people deeply impacted by the pandemic—people in hospital beds, on ventilators, struggling to breathe. But it is also a deeply practical question. Public health experts and policymakers need to know where hospitals are becoming overwhelmed with patients in order to distribute supplies where they are most needed. Researchers and data nerds like myself, meanwhile, can use hospitalization metrics to track the pandemic’s impact on different communities: reported cases may be an unreliable metric, challenged by inadequate testing and uneven reporting guidelines, but it’s hard to miss a person in the hospital.
Longtime readers may remember that this newsletter started because of hospitalization data. Back in July, when hospitalization data moved from the purview of the CDC to the HHS, I wanted to explain why these data are so important and how the change in control impacted the numbers themselves. In the months since, the HHS has increased both the number of hospitals reporting to its system and the volume of information that is publicly released about those hospitals.
I’m returning to the topic now because the HHS has made two major upgrades to its hospitalization dataset in the past week: it now includes new admissions and staffing shortages for every state. The metrics are only available at the state level; I’m hoping that county- and even individual hospital-level numbers may be released in the coming weeks.
New admissions are a useful metric because they provide a clear picture of where outbreaks are worsening, and by what degree. Patients may stay in the hospital (and be counted in a “current hospitalizations” figure) for weeks on end; isolating the number of new patients incoming allows public health researchers to see how the burden on hospitals is growing.
Across the U.S., over 10,000 patients with confirmed cases of COVID-19 are now being admitted each day.
New COVID-19 admissions rose from about 6,000 per day in late October to over 10,000 per day in mid-November. Full-size chart available here.
Staffing shortages, meanwhile, are a useful metric because they demonstrate where in the country healthcare systems are hardest hit. The HHS specifically asks hospitals to report when their staffing shortages are critical, meaning that these facilities are in serious danger of being unable to operate as normal. Staffing shortages may be the result of healthcare workers feeling burnt out, quitting, or becoming sick with COVID-19 themselves.
As of November 19, the most recent date these data are available, 18% of hospitals are currently facing a critical shortage—that’s about 1,100 out of the 6,100 hospitals reporting. 200 more hospitals report that they will be facing a critical shortage in the next week.
In North Dakota, Wisconsin, Missouri, and New Mexico, over one third of hospitals are facing a critical staffing shortage. Full-size chart available here.
Finally, here’s a look at the nation’s current hospital capacity—that is, how many hospital beds are currently occupied with sick people. As of November 19, about 600,000 of the nation’s 980,000 hospital beds are full (61%). 88,000 of those people have been diagnosed with COVID-19 (9%). These numbers will grow in the coming weeks as thousands of recently diagnosed Americans become sicker.
Across the Midwest and South, several states have over three quarters of hospital beds occupied. Full-size chart available here.
For more context on these hospitalization data and what they mean for the exhausted, terrified healthcare workers serving patients, check out:
This past week, the Department of Health and Human Services (HHS) opened up a new area of data reporting for hospitals around the country. In addition to their numbers of COVID-19 patients and supply needs, hospitals are now asked to report their numbers of influenza patients, including flu patients in the ICU and those diagnosed with both flu and COVID-19.
The new reporting fields were announced in an HHS directive on October 6. They became “available for optional reporting” this past Monday, October 19; but HHS intends to make the flu data fields mandatory in the coming weeks. The move makes sense, broadly speaking—as public health experts worry about double flu and COVID-19 outbreaks putting incredible pressure on hospital systems, collecting data on both diseases at once can help the federal public health agencies quickly identify and get aid to the hospitals which are struggling.
However, it seems likely that the new fields have caused both blips in HHS data and challenges for the state public health departments which rely upon HHS for their own hospitalization figures. As the COVID Tracking Project (and this newsletter) reported overthe summer, any new reporting requirement is likely to strain hospitals which are understaffed or underprepared with their in-house data systems. Such challenges at the hospital level can cause delays and inaccuracies in the data reported at both state and federal levels.
This week, the COVID Tracking Project’s weekly update called attention to gaps in COVID-19 hospitalization data reported by states. Missouri’s public health department specifically linked their hospitalization underreporting to “data changes from the US Department of Health and Human Services.” Five other states—Kansas, Wisconsin, Georgia, Alabama, and Florida—also reported significant decreases or partial updates to their hospitalization figures. These states didn’t specify reasons for their hospitalization data issues, but based on what I saw over the summer, I believe it is a reasonable hypothesis to connect them with HHS’s changing requirements.
Jim Salter of the Associated Pressbuilt on the COVID Tracking Project’s observations by interviewing state public health department officials. He reported that, in Missouri, some hospitals lost access to HHS’s TeleTracking data portal:
Missouri Hospital Association Senior Vice President Mary Becker said HHS recently implemented changes; some measures were removed from the portal, others were added or renamed. Some reporting hospitals were able to report using the new measures, but others were not, and as a result, the system crashed, she said.
“This change is impacting hospitals across the country,” Becker said in an email. “Some states collect the data directly and may not yet be introducing the new measures to their processes. Missouri hospitals use TeleTracking and did not have control over the introduction of the changes to the template.”
As the nation sets COVID-19 records and cases spike in the Midwest, the last thing that public health officials should be worrying about right now is inaccurate hospitalization data. And yet, here we are.
This past week, two outlets published major investigations of the Centers for Disease Control & Prevention (CDC). The first story, by Science’s Charles Piller, focuses on White House Coronavirus Task Force Coordinator Dr. Deborah Birx and her role in the hospitalization data switch from the CDC to the Department of Health and Human Services (HHS). The second story, by ProPublica’s James Bandler, Patricia Callahan, Sebastian Rotella, and Kristen Berg, provides a broader view of internal CDC dynamics and challenges since the start of the pandemic.
These stories do not focus on data specifically, but I wanted to foreground them this week as crucial insights into how the work of science and public health experts is endangered when powerful leaders prioritize their own narratives. Both stories describe how Dr. Birx disrespected and overrode CDC experts. She wanted data from every hospital in the country, every day, and failed to understand why the CDC could not deliver. The ProPublica story quotes an anonymous CDC scientist:
Birx expected “every hospital to report every piece of data every day, which is in complete defiance of statistics,” a CDC data scientist said. “We have 60% [of hospitals] reporting, which was certainly good enough for us to have reliable estimates. If we got to 80%, even better. A hundred percent is unnecessary, unrealistic, but that’s part of Birx’s dogma.”
As I explained in this newsletter’s very first issue, in July, the CDC’s hospital data reporting system was undercut in favor of a new system, built by the software company TeleTracking and managed by the HHS. Hospitals were told to stop reporting to the CDC’s system and start using TeleTracking instead. The two features published this week tie that data switch inexorably to Dr. Birx’s frustration with the CDC and her demand for more frequent data at any cost.
Public health experts across the country worried that already-overworked hospital staff would face significant challenges in switching to a new data system, from navigating bureaucracy to, in some cases, manually entering numbers into a form with 91 categories. Initial data reported by the new HHS system in July were fraught with errors—such as a report of 118% hospital beds occupied in Rhode Island—and inconsistencies when compared to the hospital data reported out by state public health departments. I co-wrote an analysis of these issues for the COVID Tracking Project.
But at least, I thought at the time, the HHS system was getting more complete data. The HHS system quickly increased the number of hospitals reporting to the federal government by about 1,500, and by October 6, Dr. Birx bragged at a press briefing that 98% of hospitals were reporting at least weekly. As Piller’s story in Science describes, however, such claims fail to mention that the bar for a hospital to be included in that 98% is very low:
At a 6 October press briefing, Birx said 98% of hospitals were reporting at least weekly and 86% daily. In its reply to Science, HHS pegged the daily number at 95%. To achieve that, the bar for “compliance” was set very low, as a single data item during the prior week. A 23 September CDC report, obtained by Science, shows that as of that date only about 24% of hospitals reported all requested data, including protective equipment supplies in hand. In five states or territories, not a single hospital provided complete data.
Piller goes on to describe how HHS’s TeleTracking data system allows errors—such as typos entered by overworked hospital staff—to “flow into [the] system” and then (theoretically) be fixed later. This method further makes HHS’s data untrustworthy for the public health researchers using it to track the pandemic. The agency is working on improvements, certainly, and public callouts of the hospital capacity numbers have slowed since TeleTracking’s rollout in July. Still, the initial political media storm created by this hospitalization data switch, combined with the details about the switch revealed by these two new features, has led me to be much warier of future data releases by both the HHS and the CDC than I was before 2020.
Just as the White House boasted, “Our staffers get tested every day,” in response to critiques of President Trump’s flaunting of public health measures, the head of the White House Coronavirus Task Force wanted to boast, “We collect data every day,” in response to critiques of the country’s overburdened healthcare system. But testing and collecting data should both be only small parts of the national response to COVID-19. When scientists see their expertise ignored in favor of recommendations that will fit a chosen political narrative, public trust is lost in the very institutions they represent. And rebuilding that trust will take a long time.