HHS’s hospitalization data are good, actually

In July, the Department of Health and Human Services (HHS) took over collecting and reporting data on how COVID-19 is impacting America’s hospital systems. This takeover from the CDC—which had reported hospitalization data since the start of the pandemic—sparked a great deal of political and public health concern. Some healthcare experts worried that a technology switch would put undue burden on already-tired hospital workers, while others worried that the White House may influence the HHS’s data.

Since that data responsibility switch, I’ve spent a lot of time with that HHS dataset. In August, I wrote a blog post for the COVID Tracking Project which compared HHS’s counts of hospitalized COVID-19 patients to the Project’s counts (compiled from states). At the time, my co-author Rebecca Glassman and I observed discrepancies between the datasets, which we attributed in part to differences in definitions and reporting pipelines. For example: some states only report those hospital patients whose cases of COVID-19 have been confirmed with PCR tests, while HHS reports all patients (including those with confirmed and suspected cases).

I’ve covered the HHS hospitalization dataset several times in this newsletter since, including its investigation by journalists at ProPublica and Science Magazine and its expansion to include new metrics. The dataset has gone from a basic report of hospital capacity in every state to a comprehensive picture of how the pandemic is hitting hospitals. It includes breakdowns of patients with confirmed and suspected cases of COVID-19, patients in the intensive care unit (ICU), and patients who are adults and children. As of November, it also includes newly admitted patients and staffing shortages. At the same time, HHS officials have worked to resolve technical issues and get more hospitals reporting accurately in the system.

A new analysis, published this past Friday by the COVID Tracking Project, highlights how reliable the HHS dataset has become. The analysis compares HHS’s counts of hospitalized COVID-19 patients to the Project’s counts, compiled from states. Unlike the analysis I worked on in August, however, this recent work benefits from HHS’s expanded metrics and more thorough documentation from both the federal agency and states. If a state reports only confirmed cases, for example, this number can now be compared directly to the corresponding count of confirmed cases from the HHS.

Here’s how the two datasets line up, as of November 29:

Line chart showing hospitalization data from state (CTP) and from HHS. When the correct definitions are used, and the HHS data offset by a single day, the two lines match almost exactly.
The COVID Tracking Project and HHS counts of hospitalized patients closely match in September, October, and November.

Since November 8, in fact, the two datasets are within two percent of each other when adjusting for definitional differences.

The blog post also discusses how patient counts match in specific states. In 41 of 52 jurisdictions (including the District of Columbia and Puerto Rico), the two datasets are in close alignment. And even in the states where hospitalization numbers match less precisely, the two datasets generally follow the same trends. In other words: there may be differences in how the HHS and individual states are collecting and reporting their numbers, but both datasets tell the same story about how COVID-19 is impacting American hospitals.

I recommend giving the full blog post a read, if you’d like all the nerdy details. Alexis Madrigal also wrote a great summary thread on Twitter:

This new COVID Tracking project analysis comes several days after an investigation in Science Magazine called the HHS dataset into question. The investigation is based on a CDC comparison of these same two datasets which doesn’t account for the reporting differences I’ve discussed.

Charles Piller, the author of this story, raises important questions about HHS’s transparency and the burden that its system places on hospitals. It’s true that the implementation of HHS’s new data reporting system was rolled out quickly, faced technical challenges, and caused a great deal of confusion for national reporters and local hospital administrators alike. The HHS dataset deserves the careful scrutiny it has received.

But now that this careful scrutiny has been conducted—and the two datasets appear to tell the same story—I personally feel comfortable about using the HHS dataset in my reporting. In fact, I produced a Stacker story based on these data just last week: States with the highest COVID-19 hospitalization rates.

Join the COVID Data Dispatch community

Leave a Reply