Tag: Hospitalization

  • National numbers, September 5

    National numbers, September 5

    About 90,000 Americans are currently in the hospital with COVID-19; this has been the daily average for the past two weeks. Chart via the CDC, screenshot taken on September 4.

    In the past week (August 28 through September 3), the U.S. reported about one million new cases, according to the CDC. This amounts to:

    • An average of 153,000 new cases each day
    • 327 total new cases for every 100,000 Americans
    • 5% more new cases than last week (August 21-27)

    Last week, America also saw:

    • 85,000 new COVID-19 patients admitted to hospitals (26 for every 100,000 people)
    • 7,300 new COVID-19 deaths (2.2 for every 100,000 people)
    • 99% of new cases now Delta-caused (as of August 28)
    • An average of one million vaccinations per day (per Bloomberg)

    Nationally, the current COVID-19 surge appears to be in a plateau. The number of new cases rose by just 5% this week, after a 3% rise last week. Hospitalizations are in a similar position: the number of patients in the hospital with COVID-19 has held steady at about 90,000 for the past two weeks.

    Among the COVID-19 experts I follow, I’ve seen some speculation that this could be the start of a Delta decline—similar to what we’ve seen in other countries, like India and the U.K. At the same time, others are noting that the U.K. saw a brief case decline followed by another rebound. If Delta does the same thing here, it would coincide with more schools starting their fall semesters and colder weather, neither of which bode well for transmission.

    And there are already a lot of children in hospitals right now. According to the COVID-NET surveillance system, there were about 14 children (under age 18) hospitalized with COVID-19 for every one million kids in the U.S. during the week ending August 28. For children under age 5, that number is 20 for every million—higher than at any other point in the pandemic.

    Thanks to COVID-19 and other diseases (like RSV, another virus that’s impacting many kids right now), pediatric intensive care units are overwhelmed, especially in the South. To understand what that means, I recommend this powerful op-ed by health equity expert Dr. Uché Blackstock. (Recent CDC research suggests that higher child hospitalization numbers are due to Delta’s high transmission, not because it impacts children more intensely. More on that later in the issue.)

    Meanwhile, high test positivity rates indicate that many COVID-19 cases are probably not being caught—especially those breakthrough cases in vaccinated people which may be mild, but can still spread the virus to others. At the national level, our test positivity rate is about 10% right now. In several states—South Dakota, Tennessee, Oklahoma, Alabama—positivity is over 20%, meaning we probably are not getting a clear picture of the surges in these locations.

    The U.S. is now seeing over 1,000 COVID-19 deaths a day, a level that we had not hit since the winter surge. Almost all of these deaths are preventable. This will continue, for as long as the Delta surge lasts.

  • One data researcher’s journey through South Carolina’s COVID-19 reporting

    One data researcher’s journey through South Carolina’s COVID-19 reporting

    By Philip Nelson

    COVID-19 hospitalizations in South Carolina, as of August 26. Posted on Twitter by Philip Nelson.

    If you post in the COVID-19 data Twitter-sphere, you’re likely familiar with Philip Nelson, a computer science student at Winthrop University—and an expert in navigating and sharing data from the state of South Carolina. Philip posts regular South Carolina updates including the state’s case counts, hospitalizations, test positivity, and other major figures, and contributes to discussions about data analysis and accessibility.

    I invited Philip to contribute a post this week after reading his Tweets about his ongoing challenges in accessing his state’s hospitalization data. Basically, after Philip publicized a backend data service that enabled users to see daily COVID-19 patient numbers by individual South Carolina hospital, the state restricted this service’s use—essentially making the data impossible for outside researchers to analyze.

    To me, his story speaks to broader issues with state COVID-19 data, such as: agencies adding or removing data without explanation, a lack of clear data documentation, failure to advertise data sources to the public, and mismatches between state and federal data sources. These issues are, of course, tied to the systematic underfunding of state and local public health departments across the country, making them unequipped to respond to the pandemic.

    South Carolina seems to be particularly arduous to deal with, however, as Philip describes below.


    I’ve been collecting and visualizing South Carolina-related COVID-19 data since April 2020. I’m a computer science major at Winthrop University, so naturally I like to automate things, but collecting and aggregating data from constantly-changing data sources proved to be far more difficult than I anticipated.

    At the beginning of the pandemic, I had barely opened Excel and had never used the Python library pandas, but I knew how to program and I was interested in tracking COVID-19 data. So, in early March 2020, I watched very closely as the South Carolina Department of Health and Environmental Control (DHEC) reported new cases.

    During the early days of the pandemic, DHEC provided a single chart on their website with their numbers of negative and positive tests; I created a small spreadsheet tracking these cases. After a few days, DHEC transitioned to a dashboard that shared county level data.

    On March 23, I noticed an issue with the new dashboard. Apparently, someone had misconfigured authentication on something in the backend. (When data sources are put behind authentication, anyone outside of the organization providing that source loses access.) The issue was quickly fixed and I carried on with my manual entry, but this was not the last time I’d have to think about authentication.

    Initially, I manually entered the number of cases and deaths that DHEC reported. I thought I might be able to use the New York Times’ COVID-19 dataset, but after comparing it to the DHEC’s data, I decided that I’d have to continue my own manual entry.

    South Carolina’s REST API

    In August 2020, I encountered some other programmers on Twitter who had discovered a REST API on DHEC’s website. REST is a standard for APIs that make it easier for developers to use services on the web. In this case, I was able to make simple requests to the server and receive data as a response. After starting a database fundamentals course during the fall 2020 semester, I figured out how to query the service: I could use the data in the API to get cases and deaths for each county by day.

    This API gave me the ability to automate all of my update processes. By further exploring the ArcGIS REST API website, I realized that DHEC had other data services available. In addition to county-level data, the agency also provided an API for cases by ZIP code. I used these data to create custom zip code level graphs upon request, and another person I encountered built a ZIP code map of cases.

    During August 2020, the CDC stopped reporting hospitalization data and the federal government shifted to using data collected by the Department of Health and Human Services (HHS) and Teletracking. DHEC provided a geoservice for hospitalizations, based off of data provided to DHEC by Teletracking on behalf of the HHS. I did some exploration of the hospitalization REST API and found that the data in this API was facility-level (individual hospitals), updated daily. I aggregated the numbers in the API based on the report date in order to provide data for my hospitalization graph. At the time, I didn’t know that the federal government does not provide daily facility level data to the public.

    In October 2020, DHEC put their ZIP code-level API behind authentication. I voiced my displeasure publicly.  In late December 2020, DHEC put the API that contained county level cases and deaths behind authentication. At this point, I began to get frustrated with DHEC for putting things behind authentication without warning, but I kind-of gave up on getting the deaths data out of an API. Thankfully, DHEC still provided an API for confirmed cases, so I switched my scripts to scrape death data from PDFs provided by DHEC each day. I didn’t like using the PDFs because they did not capture deaths that were retroactively moved from one date to another, unlike the API.

    I ran my daily updates until early June 2021, when DHEC changed their reporting format to a weekday-only schedule.  I assumed that we’d seen the last wave of the pandemic and that, thanks to readily available vaccines, we had relegated the virus to a containable state. Unfortunately, that was not the case — and by mid-July, I had resumed my daily updates.

    Hospitalization data issues

    In August 2021, people in my Twitter circle became interested in pediatric data. I decided to return to exploring the hospitalization API because I knew it had pediatric-related attributes. It was during that exploration that I realized I had access to daily facility-level data that the federal government was not providing to the public; the federal government provides weekly facility-level data. My first reaction was to build a Tableau dashboard that let people look at the numbers of adults and pediatric patients with COVID19 at the facility level in South Carolina over time.

    After posting that dashboard on Twitter, I kept hearing that people wanted a replacement for DHEC’s hospitalization dashboard which, at the time, only updated on Tuesdays. So, I made a similar dashboard that provided more information and allowed users to filter down to specific days and individual hospitals, then I tweeted it at DHEC. Admittedly, this probably wasn’t the smartest move.

    I kept exploring the hospitalization data and found that it contained COVID-19-related emergency department visits by day, another data point provided weekly by HHS. After plotting out the total number of visits each day and reading the criteria for this data point, I decided I needed to make another dashboard for this. A day after I posted the dashboard to Twitter, DHEC put the API I was using behind authentication, again I tweeted my frustration

    A little while later, DHEC messaged me on Twitter and told me that they were doing repairs to the API. I was later informed that the API was no longer accessible, and that I would have to use DHEC’s dashboard or HHS data. The agency’s dashboard does not allow data downloads, making it difficult for programmers to use it as a source for original analysis and visualization.

    I asked for information on why the API was no longer operational; DHEC responded that they had overhauled their hospitalization dashboard, resulting in changes to how they ingest data from the federal government. This response did not make it clear why DHEC needed to put authentication on the daily facility-level hospitalization data.

    Meanwhile, DHEC’s hospital utilization dashboard has started updating daily again. But after examining several days’ worth of data, I cannot figure out how the numbers on DHEC’s dashboard correlate to HHS data. I’ve tried matching columns from a range dates to the data displayed, but haven’t been able to find a date where the numbers are equal. DHEC says the data is sourced from HHS’ TeleTracking system on their dashboard, but it’s not immediately clear to me why the numbers do not match. I’ve asked DHEC for an explanation, but haven’t received a response.

    Lack of transparency from DHEC

    I’ve recently started to get familiar with the process of using FOIA requests. In the past week, I got answers on requests that I submitted to DHEC for probable cases by county per day. This data is publicly accessible (but not downloadable) via a Tableau dashboard, but there is over 500 days’ worth of data for 46 counties. The data DHEC gave to me through the FOI process are heavily suppressed and, in my opinion, not usable.

    This has been quite a journey for me, especially in learning how to communicate and collect data. It’s also been a lesson in how government agencies don’t always do what we want them to with data. I’ve learned that sometimes government agencies don’t always explain (or publicize) the data they provide, and so the job of finding and understanding the data is left to the people who know how to pull the data from these sources.

    It’s also been eye-opening to understand that sometimes, I’m not going to be able to get answers on why a state-level agency is publishing data that doesn’t match a federal agency’s data. Most of all, it’s been a reminder that we always need to press government-operated public health agencies to be as transparent as possible with public health data.

  • Three more COVID-19 data points, August 15

    Three more COVID-19 data points, August 15

    The number of children hospitalized with COVID-19 has shot up in recent weeks. Chart from the CDC COVID Data Tracker.

    A couple of additional items from this week’s COVID-19 headlines:

    • 1,900 children now hospitalized with COVID-19 in the U.S.: More kids are now seriously ill with COVID-19 than at any other time in the pandemic. The national total hit 1,902 on Saturday, according to HHS data. Asked about this trend at a press briefing on Thursday, Dr. Anthony Fauci explained that, thanks to Delta’s highly contagious properties, we’re now seeing more children get sick with COVID-19 just as we are seeing more adults get it. The vast majority of kids who contract the virus have mild cases, but this is still a worrying trend as schools reopen with, in many cases, limited safety measures. For more on this issue, I recommend Katherine J. Wu’s recent article in The Atlantic.
    • 2.7% of Americans now eligible for a third vaccine dose: Both the FDA and the CDC have now given the go-ahead for cancer patients, organ transplant recipients, and other immunocompromised Americans to get additional vaccine doses. There are about 7 million Americans eligible, comprising 2.7% of the population. Studies have shown that two Pfizer or Moderna doses do not provide these patients with sufficient COVID-19 antibodies to protect against the virus, while three doses bring the patients up to the same immune system readiness that a non-immunocompromised person would get out of two dioses. Still, this move goes against the World Health Organization’s push for wealthy nations to stop giving out boosters until the rest of the world has received more shots.
    • 203 cases so far linked to Lollapalooza, out of 385,000 attendees: Chicago residents and public health experts worried that Lollapalooza, a massive music festival held in the city in late July, would become a superspreader event. Two weeks out from the festival, however, local public health officials are seeing no evidence of superspreading, with a low number of cases identified in attendees. Lollapalooza may thus be an indicator that large events can still be held safely during the Delta surge—if events are held outdoors and the vast majority of attendees are vaccinated. (Officials estimated that 90% of the Lollapalooza crowd had gotten their shots.)

  • Featured sources, June 20

    • CDC adds more data on Delta: The CDC formally classified the Delta variant (B.1.617.2) as a Variant of Concern this week, and updated its Variant Proportions tracker page accordingly. This means data are now available on the variant’s state-by-state and regional prevalence—though the state-by-state figures are as of May 22 due to data lag.
    • AMA survey on doctor vaccinations: The American Medical Association (AMA) recently released survey data showing that 96% of U.S. physicians have been fully vaccinated against COVID-19, as of June 8. The 14-page report includes demographic data and other details.
    • Rural hospital closures: The North Carolina Rural Health Research Program at the University of North Carolina tracks hospitals in rural areas that close or otherwise stop providing in-patient care. The database includes 181 hospitals that have closed between 2005 and 2021, available in both an interactive map and a downloadable Excel file.
    • Health Security Net: This is a public repository including over 1,200 pandemic-related documents—research, hearings, government papers, and more—from the decades leading up to 2020, compiled by Georgetown’s Center for Global Health Science and Security. It’s built for scholars, journalists, and other researchers to analyze past and present responses to public health crises. 

  • HHS makes it easier to compare hospitalizations by age

    HHS makes it easier to compare hospitalizations by age

    Since mid-December, the Department of Health and Human Services has published a dataset on how the pandemic is impacting individual hospitals across the country. (You can read the CDD’s detailed description of that dataset here.) One of the most useful—and, in my opinion, most under-utilized—aspects of this facility dataset is that it provides COVID-19 hospital admissions broken out by age, allowing data users to discern which age groups are getting hardest hit by severe COVID-19 cases in different parts of the country.

    This week, the HHS made it much easier to do that analysis. The agency added hospital admissions by age to its state-level hospitalization dataset. Now, if you want to see a patient breakdown for your state, you can simply look at the state-level info already compiled by HHS data experts, rather than summing up numbers from the facility-level info yourself.

    Besides that convenience factor, there are two big advantages of the state-level info:

    • The state-level dataset is updated daily, while the facility-level dataset is updated weekly. More frequent data updates allow for more specific time series analysis.
    • Low patient numbers aren’t suppressed. In the facility-level dataset, patient numbers between 1 and 4 are suppressed with an error value (-999999) to protect patient privacy. In the age data, this happens at a lot of facilities, so it’s impossible for an outside data user to calculate accurate totals for a given city, county, or state. On the other hand, with HHS experts doing the aggregation in the state-level dataset, no values need to be obscured—basically, these state-level figures are much more accurate.

    The age groups in the state-level dataset match those available in the facility-level dataset: pediatric COVID-19 patients, patients age 18-19, patients in ten-year age ranges from 20 to 79, and patients age 80 or older. HHS also splits the patient counts into those who have confirmed COVID-19 cases (meaning their diagnosis is verified by a PCR test) and those who have suspected cases (meaning the patients have COVID-19 symptoms or a positive result on a non-PCR test.)

    You can find these new data in two places:

    Also, Conor Kelly, COVID Tracking Project volunteer and COVID-19 visualizer extraordinaire, has added these new data to his COVID-19 Tableau dashboard. (See “Hosp. Admissions Over Time,” then “Admissions by Age.”) Highly recommend checking out that dashboard and exploring the trends for your state.

    (Finally, it is possible I’m a little annoyed that the HHS made this lovely update immediately after I turned in an assignment in which I did this analysis the long way, with the facility-level dataset. Look out for that story early next week.)

    Related posts

    • Featured sources, April 11

      • Lost on the frontline: This database from Kaiser Health News and The Guardian honors American healthcare workers who died during the COVID-19 pandemic. We featured this source when it was launched back in August 2020. At that time, 167 workers were included in the interactive database; now, you can read the profiles of hundreds more. 3,607 deaths have been recorded in total. KHN and The Guardian closed their joint investigation this week, and public health leaders and policymakers are now calling on the federal government to continue this crucial work.
      • State COVID-19 Hospitalization Data Annotations, by the COVID Tracking Project: This week, the COVID Tracking Project released a snapshot of extensive research into how U.S. states are reporting their currently hospitalized COVID-19 patients. The research has informed comparisons between Project data and federal data which demonstrated the quality of the HHS hospitalization dataset. You can access these annotations, along with information on cases, tests, and deaths, at the Project’s Data Annotations page.
      • Food Insecurity in the United States: Nonprofit hunger relief organization Feeding America has compiled and mapped data showing how the pandemic has impacted food insecurity in the U.S. You can search for a specific state, county, or local food bank to see food-insecure population estimates, average meal costs, food stamp program eligibility, and more.

    • Featured sources, Jan. 10

      This week’s featured sources are all about hospitalizations and treatments. See the full CDD source list here.

      • Hospital facilities visualization by the COVID Tracking Project: Last month, the Department of Health and Human Services (HHS) released an extensive dataset showing how COVID-19 patients are impacting hospitals at the individual facility level. (See my Dec. 13 post for more information on this dataset.) The COVID Tracking Project has produced an interactive visualization from this dataset, allowing users to zoom in to individual facilities or search for hospitals in a particular city or ZIP code. I contributed some copy to this page.
      • Therapeutics distribution (from HHS): The HHS is posting a list of locations that have received monoclonal antibody therapies, for the purpose of treating COVID-19. Bamlanivimab, one such therapy, received EUA from the FDA in early November. The HHS page notes that this is not a complete list: “Although monoclonal antibody therapeutic treatments have been shipped nationwide, shipment locations are displayed for those States that have opted to have their locations displayed on this public website.”
      • Hospital discharge summaries (from the Healthcare Cost and Utilization Project): This project, under the HHS umbrella, posts time series data on U.S. hospital patients. The site recently posted summaries on patients from April to June 2020, including datasets specific to COVID-19, flu, and other viral respiratory infections. As epidemiologist Jason Salemi explains in a summary Twitter thread, the data doesn’t provide new information but may be useful for a researcher looking to dig into spring and summer hospitalization trends.
    • Facility-level hospitalization data updated on schedule

      Facility-level hospitalization data updated on schedule

      In the interest of giving credit to the HHS where credit is due: the agency updated its new facility-level hospitalization dataset right on schedule this past Monday.

      This dataset allows Americans to see exactly how COVID-19 is impacting individual hospitals across the country. In last week’s issue, I explained why I was excited about this dataset and what researchers and reporters could do with it. (The highlights: hyperlocal data that can be aggregated to different geographies, a time series back to August, demographic information on COVID-19 patients, and HHS transparency.)

      Last week, I used this hospitalization dataset—along with the HHS’s state-level hospitalization data—to build several visualizations showing how COVID-19 has hit hospitals at the individual, county, and state levels.

      I also wrote a brief article on COVID-19 hospitalizations for Stacker, hosting visualizations and highlighting some major insights. The article was sent out to local journalists across the country via a News Direct press release. (If your outlet wants to repurpose Stacker’s article, get in touch with my coworker Mel at melanie@thestacker.com!)

      A few national statistics:

      • Nearly 700 hospitals are at over 90% inpatient capacity, as of the most recent HHS data. 750 hospitals are at over 90% capacity in their ICUs.
      • The states with the highest rates of occupied beds are Maryland (79.8% of all beds occupied), Washington D.C. (80.0%), and Rhode Island (85.2%).
      • States with the highest shares of their populations hospitalized with COVID-19 are Arizona (53 patients per 100,000 population), Pennsylvania (55 per 100K), and Nevada (67 per 100K).
      • 19% of hospitals in the nation are facing critical staffing shortages, while 24% anticipate such a shortage within the next week.
      • Staffing shortages are highest in Arkansas (33.6% of hospitals in the state), Wisconsin (35.6%), and North Dakota (42.0%).

      Meanwhile, The Accountability Project has developed a datasette version of this hospitalization dataset. With a bit of code, you can query the data to access metrics for a specific hospital, city, county, or state. The Project has provided example queries to help you get started.

    • COVID-19 data for your local hospital

      COVID-19 data for your local hospital

      var divElement = document.getElementById(‘viz1608004219965’); var vizElement = divElement.getElementsByTagName(‘object’)[0]; if ( divElement.offsetWidth > 800 ) { vizElement.style.width=’100%’;vizElement.style.height=(divElement.offsetWidth*0.75)+’px’;} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width=’100%’;vizElement.style.height=(divElement.offsetWidth*0.75)+’px’;} else { vizElement.style.width=’100%’;vizElement.style.height=’650px’;} var scriptElement = document.createElement(‘script’); scriptElement.src = ‘https://public.tableau.com/javascripts/api/viz_v1.js’; vizElement.parentNode.insertBefore(scriptElement, vizElement);

      When the Department of Health and Human Services (HHS) started reporting hospitalization data at the state level back in July, I wistfully told a friend that I wished the agency would report facility-level numbers. Another federal agency had recently started reporting this type of data for nursing homes, and I appreciated the flexibility and granularity with which I was able to analyze how the pandemic was impacting nursing home patients and staff. I wanted to see the pandemic’s impact on hospitals in the same way.

      At the time, I considered this a pipe dream. The HHS was already facing major challenges: implementing a new data pipeline across the country, navigating bureaucratic issues with state public health departments, and working with individual hospitals to help them report more accurately and more often. Plus, transparency issues and political scandals plagued the agency. Making more data public seemed to be the least of its priorities.

      But I’m happy to say that this week, my pipe dream came true. On Monday, the HHS published a new hospitalization dataset including capacity, new admissions, and other COVID-19-related numbers—for over 4,000 individual facilities across America.

      This is, as I put it in a COVID Tracking Project blog post analyzing the dataset, a big deal. Project lead Alexis Madrigal called it “probably the single most important data release that we’ve seen from the Federal government.” I, in somewhat less professional terms, texted my girlfriend:

      Please appreciate the level of self-control it took for me to not actually title this issue “HHS queen shit.”

      Let me explain why this new dataset is so exciting—not just for a nerd like me, but for any American following the pandemic. I’m drawing on a COVID Tracking Project blog post unpacking the dataset, to which I contributed some explanatory copy.

      • Hyperlocal data: At a time when hospitals are overwhelmed across the nation, it is incredibly useful to see precisely which hospitals are the worst off and how COVID-19 is impacting them. Data scientists can pinpoint specific patterns and connections between regions. National aid groups can determine where to send PPE and other supplies. Journalists can see which hospitals should be the focus of local stories. The stories that can be told with this dataset are endless.
      • Aggregating to different geographies: The individual facility is the most detailed possible level of reporting for COVID-19 hospitalizations. But this HHS dataset also includes the state, county, and ZIP code for each hospital, along with unique codes that identify hospitals in the Medicare and Medicaid system. The data for specific facilities can thus be combined to make comparisons on a variety of geographic levels. I tried out a county-level visualization, for example; some counties are not represented, but you can still see a much more granular picture of hospital capacity than you would in a state-level map.
      • Time series back to August: HHS didn’t just provide data on how hospitals are coping with COVID-19 right now. They provided a full time series going back to the first week of August, with data starting shortly after the HHS began collecting information from hospitals. These historical data allow researchers to make more detailed comparisons between the nation’s last major COVID-19 peak and our current outbreak. There are some reporting errors from hospitals in the early weeks of the dataset; COVID Tracking Project analysis has shown that these errors become less significant in the week of August 28.
      • Includes coverage details: The dataset includes fields that can help researchers check the quality of an individual hospital’s reporting. These fields, called “coverage” numbers, show the number of days in a given week on which data were reported. A value of six for total_adult_patients_hospitalized_confirmed_and_suspected_covid_7_day_coverage, for example, indicates that this hospital reported how many adult COVID-19 patients it was treating on six of seven days in the past week. Many hospitals are now reporting all major metrics on six or seven days a week—HHS has really stepped up to encourage this level of reporting in recent months. For more information on hospital reporting coverage, see HHS Protect.
      • Admissions broken out by age: The HHS began reporting hospital COVID-19 admissions, or new COVID-19 patients entering the hospital, at the state level in November. The new dataset includes this information, at the facility level, for every week going back until August, and breaks out those new patients by age group. You can see exactly who is coming to the hospital with COVID-19 in age brackets of 18-19, ten-year ranges from 20 to 79, and 80+. Several other metrics in the dataset are also broken out by adult and children patients.
      • New fields: This dataset reports counts of emergency department visits, including both total visits for any reason and visits specifically related to COVID-19. (The HHS data dictionary defines this as “meets suspected or confirmed definition or presents for COVID diagnostic testing.”) These figures allow researchers to calculate the share of emergency department visits at a given hospital that are COVID-related, a new metric that wasn’t available from previous HHS reporting.
      • Signifies major effort from the HHS: When it comes to reporting hospitalization data, this agency has come a long way from the errors and transparency questions of the summer. Last week, the COVID Tracking Project published an analysis finding that HHS counts of COVID-19 patients are now in close proximity to similar counts reported by state public health departments—signifying that the federal data may be a useful, reliable complement to state data. (I discussed this analysis in last week’s issue.) The new facility-level dataset indicates that HHS data scientists understand the needs of COVID-19 researchers and communicators, and are working to make important data public. I will continue to carefully watch this agency, as will many of my fellow reporters. But I can’t deny that this data release was a major step for transparency and trust.

      To get started with this dataset, you can zoom in to look at your community on this Tableau dashboard I made, visualizing the most recent week of data. (That most recent week of data reflects November 27 through December 3. As the dataset was first published last Monday, December 7, I’m anticipating an update tomorrow.)

      Or, if you’d like to see more technical details on how to use the dataset, check out this community FAQ page created by data journalists and researchers at Careset Systems, the University of Minnesota, COVID Exit Strategy, and others.

      Finally, for more exploration of the research possibilities I outlined above, you can read the COVID Tracking Project’s analysis. The post includes some pretty striking comparisons from summer outbreaks to now.

    • HHS’s hospitalization data are good, actually

      HHS’s hospitalization data are good, actually

      In July, the Department of Health and Human Services (HHS) took over collecting and reporting data on how COVID-19 is impacting America’s hospital systems. This takeover from the CDC—which had reported hospitalization data since the start of the pandemic—sparked a great deal of political and public health concern. Some healthcare experts worried that a technology switch would put undue burden on already-tired hospital workers, while others worried that the White House may influence the HHS’s data.

      Since that data responsibility switch, I’ve spent a lot of time with that HHS dataset. In August, I wrote a blog post for the COVID Tracking Project which compared HHS’s counts of hospitalized COVID-19 patients to the Project’s counts (compiled from states). At the time, my co-author Rebecca Glassman and I observed discrepancies between the datasets, which we attributed in part to differences in definitions and reporting pipelines. For example: some states only report those hospital patients whose cases of COVID-19 have been confirmed with PCR tests, while HHS reports all patients (including those with confirmed and suspected cases).

      I’ve covered the HHS hospitalization dataset several times in this newsletter since, including its investigation by journalists at ProPublica and Science Magazine and its expansion to include new metrics. The dataset has gone from a basic report of hospital capacity in every state to a comprehensive picture of how the pandemic is hitting hospitals. It includes breakdowns of patients with confirmed and suspected cases of COVID-19, patients in the intensive care unit (ICU), and patients who are adults and children. As of November, it also includes newly admitted patients and staffing shortages. At the same time, HHS officials have worked to resolve technical issues and get more hospitals reporting accurately in the system.

      A new analysis, published this past Friday by the COVID Tracking Project, highlights how reliable the HHS dataset has become. The analysis compares HHS’s counts of hospitalized COVID-19 patients to the Project’s counts, compiled from states. Unlike the analysis I worked on in August, however, this recent work benefits from HHS’s expanded metrics and more thorough documentation from both the federal agency and states. If a state reports only confirmed cases, for example, this number can now be compared directly to the corresponding count of confirmed cases from the HHS.

      Here’s how the two datasets line up, as of November 29:

      Line chart showing hospitalization data from state (CTP) and from HHS. When the correct definitions are used, and the HHS data offset by a single day, the two lines match almost exactly.
      The COVID Tracking Project and HHS counts of hospitalized patients closely match in September, October, and November.

      Since November 8, in fact, the two datasets are within two percent of each other when adjusting for definitional differences.

      The blog post also discusses how patient counts match in specific states. In 41 of 52 jurisdictions (including the District of Columbia and Puerto Rico), the two datasets are in close alignment. And even in the states where hospitalization numbers match less precisely, the two datasets generally follow the same trends. In other words: there may be differences in how the HHS and individual states are collecting and reporting their numbers, but both datasets tell the same story about how COVID-19 is impacting American hospitals.

      I recommend giving the full blog post a read, if you’d like all the nerdy details. Alexis Madrigal also wrote a great summary thread on Twitter:

      This new COVID Tracking project analysis comes several days after an investigation in Science Magazine called the HHS dataset into question. The investigation is based on a CDC comparison of these same two datasets which doesn’t account for the reporting differences I’ve discussed.

      Charles Piller, the author of this story, raises important questions about HHS’s transparency and the burden that its system places on hospitals. It’s true that the implementation of HHS’s new data reporting system was rolled out quickly, faced technical challenges, and caused a great deal of confusion for national reporters and local hospital administrators alike. The HHS dataset deserves the careful scrutiny it has received.

      But now that this careful scrutiny has been conducted—and the two datasets appear to tell the same story—I personally feel comfortable about using the HHS dataset in my reporting. In fact, I produced a Stacker story based on these data just last week: States with the highest COVID-19 hospitalization rates.