Tag: HHS hospitalization data

  • The case for mask mandates in healthcare settings

    The case for mask mandates in healthcare settings

    A lot of healthcare organizations have ended mask mandates in recent months, many of them citing guidance changes at state or local levels to no longer require this level of precaution. Some of this stems back to a CDC policy change last fall; the agency recommended that healthcare settings only need universal masking when COVID-19 spread is high.

    Now, this is likely another case of the CDC—and potentially quite a few other health agencies—making recommendations that are, in fact, very dangerous. There’s plenty of evidence to support that mask mandates should continue in healthcare settings, to protect vulnerable patients from COVID-19 and many other illnesses.

    Let’s go over some key points:

    • Hospital-acquired COVID-19 infections: Since the start of the pandemic, people who go to the hospital for issues other than COVID-19 have contracted the virus while there. The HHS tracks these cases, and their data show that this is a continued problem: even as new COVID-19 admissions in hospitals have declined in 2023, hospital-acquired infections have continued to be an issue, with hundreds of these cases reported each day in recent months. Universal masking reduces these infections.
    • Wastewater surveillance in hospitals: Another way to track COVID-19 in healthcare settings is through targeted wastewater surveillance, taking samples from a particular facility’s sewage. A few hospital systems are doing this, such as NYC’s public system (Health + Hospitals). While there are limited public data from these programs, researchers who run them have said that the results show consistent COVID-19 spread; masks help mitigate this transmission.
    • Healthcare facility outbreaks: After lifting a mask mandate, hospitals and other healthcare facilities may have COVID-19 outbreaks among patients and staff—both putting vulnerable patients at risk and exacerbating staffing shortages. One hospital in the Bay Area recently reinstated a mask mandate after such an outbreak, according to local paper the San Francisco Chronicle.
    • Patients hesitant to visit: Many patients at higher risk for severe COVID-19 may become wary of routine doctors’ visits or procedures if their clinics stop requiring masks. This is a sentiment I’ve seen frequently on social media over the last few months, as higher-risk people push for healthcare organizations to keep their mask mandates.
    • Harming long-term outcomes: Any already-vulnerable person who gets COVID-19 at a healthcare facility is likely to face long-term symptoms from the virus, potentially complicating their existing chronic conditions. This fact contributes to individual patients’ wariness, and it can also lead to complications for potential treatments or research studies. For example, a Stanford study testing Paxlovid for Long COVID has recently stopped requiring its staff to mask, according to patient reports; participants have pointed out that this could harm the study’s results.

    If you’re interested in getting involved with advocacy in this area, I recommend checking out Mandate Masks US and connected organizations. These groups are pushing for masks to remain in healthcare through social media campaigns, petitions, contacting politicians, and even some in-person protests.

  • Potential data fragmentation when the federal COVID-19 public health emergency ends

    Potential data fragmentation when the federal COVID-19 public health emergency ends

    About half of U.S. states have D or F grades on their breakthrough case reporting, according to the Pandemic Prevention Institute and Pandemic Tracking Collective. Other metrics could be heading in this direction next year.

    COVID-19 is still a public health emergency. At the moment, this is true according to both the general definition of this term and official declarations by the federal government. But the latter could change in the coming months, likely leading to more fragmentation in U.S. COVID-19 data.

    A reader recently asked me about the federal government’s ability to compile and report COVID-19 data, using our new anonymous Google form. They asked: “Will the CDC at some point stop reporting COVID data even though it may still be circulating, or is it a required, reportable disease?”

    It’s difficult to predict what the CDC will do, as we’ve seen in the agency’s many twists and turns throughout the pandemic. That said, my best guess here is that the CDC will always provide COVID-19 data in some form; but the agency could be severely limited in data collection and reporting based on the disease’s federal status.

    The CDC’s authority

    One crucial thing to understand here is that the CDC does not actually have much power over state and local public health departments. It can issue guidance, request data, distribute funding, and so forth, but it isn’t able to require data collection in many circumstances.

    Here’s Marc Lipsitch, an epidemiologist at Harvard’s public health school and interim director of science at the CDC’s Center for Forecasting and Outbreak Analytics, explaining this dynamic. This quote is from an interview that I conducted back in May for my FiveThirtyEight story on the new center:

    Outside of a public health emergency, CDC has no authority to require states to share data. And even in an emergency, for example, if you look on the COVID Data Tracker, there are systems that have half the states or some of the states. That’s because those were the ones that were willing to share. And that is a very big handicap of doing good modeling and good tracking… Everything you’re trying to measure, for any decision, is better if you measure it in all the states.

    Consider breakthrough cases as one example. According to the Pandemic Prevention Institute’s scorecard for breakthrough data reporting, about half of U.S. states have D or F grades, meaning that they are reporting zero or very limited data on post-vaccination COVID-19 cases. The number of states with failing grades has increased in recent months, as states reduce their COVID-19 data resources. As a result, federal agencies have an incomplete picture of vaccine effectiveness.

    Wastewater data is another example. While the CDC is able to compile data from all state and local public health departments with their own wastewater surveillance systems—and can pay Biobot to expand the surveillance network—the agency has no ability to actually require states to track COVID-19 through sewage. This lack of authority contributes to the CDC’s wastewater map still showing many empty spaces in states like Alabama and North Dakota.

    The COVID-19 public health emergency

    According to the Department of Health and Human Services (HHS), a federal public health emergency gives the HHS and CDC new funding for health measures and the authority to coordinate between states, among other expanded powers.

    During the COVID-19 pandemic, the federal emergency was specifically used to require data collection from state health departments and individual hospitals, POLITICO reported in May. According to POLITICO, the required data includes sources that have become key to our country’s ability to track the pandemic, such as:

    • PCR test results from state and local health departments;
    • Hospital capacity information from individual healthcare facilities;
    • COVID-19 patients admitted to hospitals;
    • COVID-19 cases, deaths, and vaccination status in nursing homes.

    The federal COVID-19 public health emergency is formally controlled by HHS Secretary Xavier Becerra. Becerra most recently renewed the emergency in July, with an expiration date in October. Health experts anticipate that it will be renewed again in October, because HHS has promised to give states a 60-day warning before the emergency expires and there’s been no warning for this fall. That leaves us with a new potential expiration date in January 2023.

    CDC officials are seeking to permanently expand the agency’s authority to include this data collection—with a particular priority on hospitalization data. But that hasn’t happened yet, to the best of my knowledge. So, what might happen to our data when the federal emergency ends?

    Most likely, metrics that the CDC currently requires from states will become voluntary. As we see right now with breakthrough cases and wastewater data, some states will probably continue reporting while others will not. Our federal data will become much more piecemeal, a patchwork of reporting for important sources such as hospitalizations and lab test results.

    It’s important to note here that many states have already ended their own public health emergencies, following a trend that I covered back in February. Many of these states are now devoting fewer resources to free tests, contact tracing, case investigations, public data dashboards, and other data-related efforts than they were in prior phases of the pandemic. New York was the latest state to make such a declaration, with Governor Kathy Hochul letting her emergency powers expire last week.

    How the flu gets tracked

    COVID-minimizing officials and pundits love to compare “endemic” COVID-19 to the flu. This isn’t a great comparison for many reasons, but I do think it’s helpful to look at how flu is currently tracked in the U.S. in order to get a sense of how COVID-19 may be tracked in the future.

    The U.S. does not count every flu case; that kind of precise tracking on a large scale was actually a new innovation for COVID-19. Instead, the CDC relies on surveillance networks that estimate national flu cases based on targeted tracking.

    There are about 400 labs nationwide (including public health labs in all 50 states) participating in flu surveillance via the World Health Organization’s global program, processing flu tests and sequencing cases to track viral variants. Meanwhile, about 3,000 outpatient healthcare providers in the U.S. Outpatient Influenza-like Illness Surveillance Network provide the CDC with flu-related electronic health records. You can read more about both surveillance programs here.

    Sample CDC flu reporting from spring 2020. The agency provides estimates of flu activity rather than precise case numbers.

    The CDC reports data from these surveillance programs on a dashboard called FluView. As you can see, the CDC provides estimates about flu activity by state and by different demographic groups, but the data may not be very granular (eg. no estimates by county or metro area) and are provided with significant time delays.

    Other diseases are tracked similarly. For example, the CDC will track new outbreaks of foodborne illnesses like E. coli when they arise but does not attempt to log every infection. When researchers seek to understand the burden of different diseases, they often use hospital or insurance records rather than government data.

    One metric that I’d expect to remain unchanged when the COVID-19 emergency ends is deaths: the CDC’s National Center for Health Statistics (NCHS) comprehensively tracks all deaths through its death certificate system. But even provisional data from NCHS are reported with a delay of several weeks, with complete data unavailable for at least a year.

    Epidemiologists I’ve interviewed say that we should be inspired by COVID-19 to improve surveillance for other diseases, rather than allowing COVID-19 to fall into the flu model. Wastewater data could help with this; a lot of wastewater researchers (including those at Biobot) are already working on tracking flu and other diseases. But to truly improve surveillance, we need more sustained investment in public health at all levels—and more data collection authority for the CDC and HHS.

    More federal data

  • Sources and updates, May 15

    • COVID-19 deaths that could’ve been prevented with vaccines: A new analysis from the Brown University School of Public Health suggests that almost 319,000 U.S. COVID-19 deaths could have been avoided if all adults had gotten vaccinated against the disease. This number differs significantly by state; there were 29,000 preventable COVID-19 deaths in Florida, compared to under 300 in Vermont. For more context on the analysis, see this article in NPR.
    • CDC dashboard in Spanish: The CDC has translated its COVID-19 Data Tracker into Español. At a glance, the Spanish version appears to include all the major aspects of the tracker: cases, deaths, vaccinations, community transmission, variant prevalence, wastewater, etc. Of course, it would have been great if the agency could’ve devoted resources to this translation effort well below spring 2022, when the number of people looking to the agency for COVID-19 guidance is pretty low.
    • CDC may lose access to COVID-19 data: According to reporting from POLITICO, the CDC and other national health agencies may no longer have the authority to require COVID-19 data reporting from states and individual health institutions if the Biden administration allows the country’s federal pandemic health emergency to end this summer. Such a change in authority could lead to the CDC (and numerous other researchers across the country) losing standardized datasets for COVID-19 hospitalizations, transmission in nursing homes, PCR testing, and other key metrics. Considering that hospitalizations are considered the most reliable metric right now, this could be a major blow.
    • COVID-19 testing declines globally: Speaking of losing reliable data: this report from the Associated Press caught my eye. The story, by Laura Ungar, explains that the U.S. is not the only country to see a major decrease in reported COVID-19 tests (a.k.a. Lab-based PCR, not at-home rapid tests) in recent months. “Experts say testing has dropped by 70 to 90% worldwide from the first to the second quarter of this year,” Ungar writes, “the opposite of what they say should be happening with new omicron variants on the rise in places such as the United States and South Africa.”
    • More promising data on Moderna kids’ vaccine: While Pfizer’s vaccine for children under five remains in development, Moderna continues to release data suggesting that this company is further ahead in providing protection for the youngest age group. This week, Moderna announced a half-dose of its vaccine provides a “strong immune response” in children ages six to 11; the announcement was backed up by a scientific study published in the New England Journal of Medicine (so, more rigorous than your typical press release). The FDA is currently evaluating a version of Moderna’s vaccine for children between ages six months and six years.

  • All the U.S.’s COVID-19 metrics are flawed

    All the U.S.’s COVID-19 metrics are flawed

    This week, I had a big retrospective story published at FiveThirtyEight: I looked back at the major metrics that the U.S. has used to track COVID-19 over the past two years—and how our country’s fractured public health system hindered our use of each one.

    The story is split into seven sections, which I will briefly summarize here:

    • Case counts, January to March 2020: Early on in the pandemic, the U.S. had a very limited picture of COVID-19 cases due to our very limited testing: after rejecting a test made by the WHO, the CDC made its own test—which turned out to have contamination issues, further slowing down U.S. testing. In early March 2020, for example, the majority of cases in NYC were identified in hospitals, suggesting that official counts greatly underestimated the actual numbers of people infected.
    • Tests administered, March to September 2020: Test availability improved after the first wave of cases, with organizations like the COVID Tracking Project keeping a close eye on the numbers. But there were a lot of challenges with the testing data (like different units across different states) and access issues for Americans with lower socioeconomic status.
    • Hospitalizations, October to December 2020: By late 2020, many researchers and journalists were considering hospitalizations to be a more reliable COVID-19 metric than cases. But it took a long time for hospitalization data to become reliable on a national scale, as the HHS launched a new tracking system in the summer and then took months to work out kinks in this system.
    • Vaccinations, January to June 2021: When the vaccination campaign started in late 2020, it was “tempting to forget about all other COVID-19 metrics,” I wrote in the story. But the U.S.’s fractured system for tracking vaccinations made it difficult to analyze how close different parts of the country were to prospective “herd immunity,” and distracted from other public health interventions that we still needed even as people got vaccinated.
    • Breakthrough cases, July to November 2021: The Delta surge caused widespread infections in people who had been vaccinated, but the CDC—along with many state public health agencies—was not properly equipped to track these breakthrough cases. This challenge contributed to a lack of good U.S. data on vaccine effectiveness, which in turn contributed to confusion around the need for booster shots.
    • Hospitalizations (again), December to January 2022: The Omicron surge introduced a need for more nuance in hospitalization data, as many experts asked whether COVID-19 patients admitted with Omicron were actually hospitalized for their COVID-19 symptoms or for other reasons. Nuanced data can be useful in analyzing a variant’s severity; but all COVID-related hospitalizations cause strain on the healthcare system regardless of their cause.
    • New kinds of data going forward: In our post-Omicron world, a lot of public health agencies are shifting their data strategies to treat COVID-19 more like the flu: less tracking of individual cases, and more reliance on hospitalization data, along with newer sources like wastewater. At this point in the pandemic, we should be fortifying data systems “for future preparedness,” I wrote, rather than letting the systems we built up during the pandemic fall to the wayside.

    I did a lot of reporting for this piece, including interviews with some of the U.S.’s foremost COVID-19 data experts and communicators. As long as the piece is, there were a lot of metrics (and issues with these metrics) that came up in these interviews that I wasn’t able to include in the final story—so I wanted to share some bonus material from my reporting here.

    Long COVID:

    As I’ve discussed in previous issues, the U.S. has done a terrible job of collecting data on Long COVID. The NIH estimates that this condition follows a significant share of coronavirus infections (between 10% and 30%), but we have limited information on its true prevalence, risk factors, and strategies for recovery.

    Here’s Dr. Eric Topol, the prolific COVID-19 commentator and director of the Scripps Research Translational Institute, discussing this data problem:

    [Long COVID has] been given very low priority, very little awareness and recognition. And we have very little data to show for it, because it hasn’t been taken seriously. But it’s a very serious matter.

    We should have, early on, gotten at least a registry of people —a large sample, hundreds of thousands of people prospectively assessed, like is being done elsewhere [in the U.K. and other countries]. So that we could learn from them: how long the symptoms lasted, what are the symptoms, what are the triggers, what can be done to avoid it, the role of vaccines, the role of boosters, all this stuff. But we have nothing like that.

    The NIH’s RECOVER initiative may answer some of these questions, but it will take months—if not years—for the U.S. to actually collect the comprehensive data on Long COVID that we should have started gathering when the condition first began gaining attention in 2020.

    Demographic data:

    In the testing section of the story, I mention that the U.S. doesn’t provide much demographic data describing who’s getting tested for COVID-19. There is actually a little-known provision in the CARES Act that requires COVID-19 testing providers to collect certain demographic data from all people who seek tests. But the provision is not enforced, and any data that are collected on this subject aren’t making it to most state COVID-19 dashboards, much less to the CDC’s public data dashboard.

    Here’s Dr. Ellie Murray, an epidemiologist at the Boston University School of Public Health, discussing why this is an issue:

    We don’t collect reason for seeking a test. We don’t collect age, race, ethnicity, occupation of people who seek a test. Those kinds of things could provide us with some really valuable information about who is getting tested, when, and why—that could help us figure out, what are the essential occupations where people are having a lot of exposures and therefore needing to get a lot of tests? Or are there occupations where we’re seeing a lot of people end up in hospital, who have those occupations, but they’re not getting tests, because actually, the test sites are nowhere near where they need to work, or they don’t have the time to get there before they close.

    And so we don’t really know who is getting tested, and that, I think, is a bigger problem, than whether the numbers that are being tested tell us anything about the trajectory of COVID. Because we have case data, and hospitalization data, and death data to tell us about the trajectory. And the testing could really tell us more about exposure, and concern, and access—if we collected some more of this data around who is getting tested and why.

    Test positivity:

    Speaking of testing: another metric that I didn’t get into much in the story was test positivity. Test positivity—or, the share of COVID-19 tests that return a positive result—has been used from the CDC to local school districts as a key metric to determine safety levels. (For more on this metric, check out my FAQ post from this past January.)

    But even when it’s calculated correctly, test positivity faces the same challenges as case data: namely, bias in who’s getting tested. Here’s Lauren Ancel Meyers, director of the University of Texas at Austin’s COVID-19 Modeling Consortium, explaining this:

    Test positivity is just as fraught [as cases]. It’s just as difficult, because you need to know the numerator and the denominator—what’s influencing the numerator and the denominator? Who is going to get tested, who has access to tests? … It used to be, at the very beginning [of the pandemic], nobody could get a test who wanted a test. And now, today, everybody has a test in their medicine cabinet, and they don’t get reported when they test. It’s different issues that have ebbed and flowed throughout this period.

    Often, if you’re a good data analyst or a modeler, and you have all the information, you can handle those kinds of biases. But the problem is, we don’t know the biases from day to day. And so even though there are statistical tools to deal with incomplete bias, without knowing what those biases are, it’s very hard to do reliable inference, and really hard to understand what’s actually going on.

    Genetic surveillance:

    Also related to testing: genetic surveillance for coronavirus variants of concern. Genetic surveillance is important because it can help identify new variants that may be more transmissible or more likely to evade protection from vaccines. It can additionally help track the qualities of concerning variants once they are identified (if variant data is linked to hospitalization data, vaccination data, and other metrics—which is not really happening in the U.S. right now.)

    Our current genetic surveillance systems have a lot of gaps. Here’s Leo Wolansky, from the Rockefeller Foundation’s Pandemic Prevention Institute (PPI), discussing how his organization seeks to address these challenges:

    [We’re trying to understand] where our blind spots are, and the bias that we might experience with a lot of health system reporting. One of the things that PPI has been doing is identifying centers of excellence in different parts of the world that can improve the sequencing of new cases in underrepresented countries. And so for example, we’ve provided quite a bit of support to the folks in South Africa that ultimately rang the alarm on Omicron.

    We’re also doing this by actually trying to systematically assess countries’ capacity for this type of genomic surveillance. So thinking about, how many tests have been recorded? What’s that test positivity rate? Do we have confidence in the basic surveillance system of the country? And then, do we also see enough sequences, as well as sequencing facility data, to demonstrate that this country can sequence and just isn’t doing enough—or cannot sequence because it needs foundational investment in things like laboratories and devices. We’ve been mapping this capacity just to make sure that we understand where we should be investing as a global community.

    The Pandemic Prevention Institute is taking a global perspective in thinking about data gaps. But these gaps also exist within the U.S., as is clear when one looks at the differences in published coronavirus sequences from state to state. Some states, like Wyoming, Vermont, and Colorado, have sequenced more than 10% of their cumulative cases, according to the CDC. Others, like Oklahoma, Iowa, and South Dakota, have sequenced fewer than 3%. These states need additional investment in order to thoroughly monitor coronavirus transmission among their residents.

    Cohort studies:

    In a cohort study, researchers follow a group of patients over time in order to collect long-term data on specific health conditions and/or the outside factors that influence them. The U.S. has set up a few cohort studies for COVID-19, but they haven’t been designed or utilized in a way that has actually provided much useful data—unlike cohort studies in some other countries. (The U.K., for example, has several ongoing cohort studies collecting information on COVID-19 symptoms, infections in schools, seroprevalence, and more.)

    Here’s Dr. Ellie Murray explaining the lost potential of these studies in the U.S.:

    There are a number of existing cohort studies that have been asked or who asked to pivot to collecting COVID information and therefore collecting long-term COVID information on their cohorts. But there doesn’t seem to be any kind of system to [determine], what are the questions we need answered about COVID from these kinds of studies? And how do we link up people who can answer those questions with the data that we’re collecting here, and making sure we’re collecting the right data? And if this study is going to answer these questions, and this one is going to answer those questions—or, here’s how we standardize those two cohorts so that we can pull them together into one big COVID cohort.

    And so, we end up in this situation where, we don’t know what percent of people get Long COVID, even though we’ve been doing this for over two years. We don’t even really know, what are all the different symptoms that you can get from COVID? … There are all these questions that we could be sort-of systematically working our way through, getting answers and using them to inform our planning and our response. [In addition to having] standardized questions, you also need a centralized question, instead of just whatever question occurs to someone who happens to have the funding to do it.

    Excess deaths:

    Excess deaths measure the deaths that occur in a certain region, over a certain period of time, above the number of deaths that researchers expect to see in that region and time period based on modeling from past years’ data. Excess deaths are the COVID-19 metric with the longest lag time: it takes weeks from initial infection for someone to die of the disease, and can take weeks further for a death certificate to be incorporated into the public health system.

    Once that death information is available, however, it can be used to show the true toll of the pandemic—analyzing not just direct COVID-19 deaths, but also those related to isolation, financial burden, and other indirect issues—as well as who has been hit the hardest.

    Here’s Cecile Viboud, a staff scientist at the NIH who studies infectious disease mortality, discussing this metric:

    We’ve been using the excess death approach for a long time. It comes from flu research, basically starting in 1875 in the U.K. And it was used quite a lot during the 1918 pandemic. It can be especially good in examining historical records where you don’t have lab confirmation—there was no testing ability back in those days…

    So, I think it’s kind of natural to use it for a pandemic like COVID-19. Very early on, you could see how useful this method was, because there was so little testing done. In March and April 2020, you see substantial excess, even when you don’t see lab-confirmed deaths. There’s a disconnect there between the official stats, and then the excess mortality… [We can also study] the direct effect of COVID-19 versus the indirect effect of the pandemic, like how much interventions affected suicide, opioids, death, accidents, etc. The excess approach is also a good method to look at that.

    Viboud also noted that excess deaths can be useful to compare different parts of the U.S. based on their COVID-19 safety measures. For example, one can analyze excess deaths in counties with low vaccination rates compared to those with high vaccination rates. This approach can identify the pandemic’s impact even when official death counts are low—an issue that the Documenting COVID-19 project has covered in-depth.

    Again, you can read the full FiveThirtyEight story here!

    More federal data

  • Sources and updates, March 20

    Data sources and data-related updates for this week:

    • APM Research Lab relaunches Color of Coronavirus tracker: From April 2020 to March 2021, the American Public Media (APM) Research Lab compiled state-level data on COVID-19 deaths by race and ethnicity, in order to present a picture of which U.S. populations were most hard-hit by the pandemic. The project relaunched this week, now utilizing CDC mortality statistics instead of compiling data from states. One major finding from the updated data: “Indigenous Americans have the highest crude COVID-19 mortality rates nationwide—about 2.8 times as high as the rate for Asians, who have the lowest crude rates.”
    • CDC might take back hospital data reporting responsibilities from HHS: As longtime readers may remember, back in summer 2020, the Department of Health and Human Services (HHS) developed a new data system for hospitals to report COVID-19 patient numbers and other related metrics. At the time, the HHS was taking over responsibility for these data from the CDC; this inspired some political posturing and concerns about data quality, though the eventual HHS dataset turned out to be very comprehensive and useful. (This original data switch was the subject of my very first CDD issue, and I followed the HHS data system closely throughout 2020.) Now, Bloomberg reports, the CDC wants to take back hospital data reporting from the HHS. More political posturing and data quality concerns are, it seems, inevitable—this time tied to the CDC’s challenges in modernizing its data systems.
    • Hospitalizations among young children, by race/ethnicity during Omicron surge: Two MMWR studies that caught my attention this week: one examined hospitalization rates among young children, ages 0 to 4, between March 2020 and February 2022. This study found that COVID-19 hospitalization rates among children in this age range were five times higher at the peak of the Omicron surge compared to the Delta surge. The second report examined hospitalizations by race and ethnicity, finding that, during Omicron’s peak, hospitalization rates among Black adults were nearly four times higher than rates among white adults. Both reports clearly demonstrate who is still vulnerable to COVID-19 as the U.S. abandons safety measures.
    • Pfizer and Moderna both seeking EUAs for additional booster shots: POLITICO reported this week that first Pfizer, then Moderna have requested Emergency Use Authorization for fourth doses of their COVID-19 vaccines. Pfizer’s request is specifically for people age 65 and over, while Moderna’s is for all adults. Notably, Pfizer’s request is based on data from Israel suggesting that immunity from an initial booster wanes after several months—just as Pfizer’s initial case for boosters in the fall was also based on Israeli data.
    • Global COVID-related deaths may be three times higher than official records: Throughout the pandemic, researchers have used excess mortality (i.e. the deaths occurring in a given region and time period above what’s expected) to determine the true toll of COVID-19. A new study, published this week in The Lancet, took this approach for 191 countries and territories from January 2020 to December 2021. The researchers estimate that about 18 million people died worldwide due to the pandemic—including not just direct COVID-19 deaths but also others caused by COVID-related disruptions. That’s three times higher than the 6 million COVID-19 deaths that have been officially reported in this time period.

  • Sources and updates, February 13

    • Biden administration is reportedly shifting hospital reporting on COVID-19 patients: During the Omicron surge, there’s been a push among some COVID-19 experts (and in the media) to separately report patients who are admitted to hospitals because of their COVID-19 symptoms from patients who are admitted to hospitals for some other reason, but then test positive later. This push, also called the “with” versus “for” issue, has reached the White House, according to a recent report from POLITICO. The Biden administration now wants all hospitals to separate out their COVID-19 numbers in this way, to get a better picture of severe disease caused by the virus. Such a shift may be tricky for hospitals to follow, however, in part because a lot of people who appear to be incidental, “with COVID-19” patients actually had rare symptoms or chronic conditions exacerbated by the virus. “You need a panel of experts to review the cases” and judge this issue, expert Eric Topol told POLITICO.
    • Long-term cardiovascular outcomes of COVID-19: A new paper from researchers at the Department of Veterans Affairs (VA), published this week in Nature Medicine, sheds light on potential long-term COVID-19 impacts for the heart. The researchers used national health records databases from the VA to study over 150,000 COVID-19 patients—a much larger study size than most Long COVID research in the U.S. The paper found that, after their first month of infection, COVID-19 patients are at increased risk for a variety of cardiovascular issues, including heart inflammation and heart failure. Outside scientists commenting on the paper in Science magazine said that the findings clearly demonstrate that COVID-19 has grave long-term risks for heart health.

  • COVID source callout: COVID-19 deaths in U.S. hospitals

    Readers active on COVID-19 Data Twitter may have seen this alarmist Tweet going around earlier this weekend. In this post, a writer (notably, one with no science, health, or data background) posted a screenshot showing that the Department of Health and Human Services (HHS) is no longer requiring hospitals to include COVID-19 deaths that occur at their facilities in their daily reports to the agency.

    This is not the end of U.S. COVID-19 death reporting, as the Tweet’s author insinuated. Primarily because: hospitals are not the primary source of COVID-19 death numbers. These statistics come from death certificates, which are processed by local health departments, coroners, and medical examiners; death certificate statistics are sent to state health departments, which in turn send the numbers to the CDC. The CDC is still reporting COVID-19 deaths with no disruptions, and, in fact, released a highly detailed new dataset on these deaths last month.

    For more explanation, see this thread by Erin Kissane (COVID Tracking Project co-founder) and this one from epidemiologist Justin Feldman. It’s particularly important to note here that, as Feldman points out, plenty of COVID-19 deaths don’t occur in hospitals! About one-third of COVID-19 deaths occurred outside these facilities in 2020.

    (Note: The Documenting COVID-19 project has written, in great detail, about how COVID-19 deaths are reported in our Uncounted series. See: this article at USA Today and this reporting recipe.)

    It is certainly worth asking why the HHS took in-hospital COVID-19 deaths off the list of required metrics for hospitals. This data field had some utility for researchers looking to identify COVID-19 mortality rates within these facilities—though, from what I could tell, nobody was looking at it very much before this weekend.

    But, again, this is not the end of COVID-19 death reporting! This is the HHS making one small change to a massive hospitalization dataset—which was primarily used for looking at other metrics—while the CDC’s death reporting continues as usual.

  • Hospitalization data lag behind the actual crisis

    Hospitalization data lag behind the actual crisis

    A record number of COVID-19 patients are now receiving care in U.S. hospitals, according to data from the Department of Health and Human Services (HHS). As of January 16, the agency reports that about 157,000 COVID-19 patients are currently hospitalized nationwide, and one in every five hospitalized Americans has been diagnosed with this disease.

    The HHS also reports that about 78% of staffed hospital beds and 82% of ICU beds are currently occupied. These numbers, like the total COVID-19 patient figure, are higher than they have been at any other point during the pandemic.

    Even so, reports from the doctors and other staff working in these hospitals—conveyed in the news and on social media—suggest that the HHS data don’t capture the current crisis. The federal data may be reported with delays and fail to capture the impact of staffing shortages, obscuring the fact that many regions and individual hospitals are currently operating at 100% capacity.

    Dr. Jeremy Faust, an emergency physician at Brigham and Women’s Hospital and professor at Harvard Medical School, recently made this argument in Inside Medicine, his Bulletin newsletter. Last week, I shared Faust and colleagues’ circuit breaker dashboard, which extrapolates from both federal hospitalization figures and current case data to model hospital capacity in close-to-real-time. This week, Faust used that dashboard to show that the crisis inside hospitals is more dire than HHS numbers suggest.

    He writes:

    There seems to be a disconnect between the official data made available to the public and what’s happening on the ground. The reason for this is unacceptable delays in reporting. HHS and other agencies have always acknowledged that public reports on hospital capacity—for Covid-19 and all other conditions—actually reflect data that are 1-2 weeks old. But until now, such lags rarely mattered because most hospitals haven’t had to operate near or above 100% capacity routinely, even during the pandemic. Under normal circumstances, whether a hospital was 65% or 75% full does not make much of a difference, though as the numbers creep up, care can be compromised. And even in past moments when capacity was closer to 100%, a wave of Omicron-driven Covid-19 was not headed towards hospitals.

    For example: on Monday, Faust wrote, his team’s circuit breaker dashboard showed that “every single county in Maryland appears to be over 100% capacity,” even though the HHS said that 87% of hospital beds were occupied in the state. Healthcare workers in Maryland backed up the claim that all counties were over 100% capacity, with personal accounts of higher-than-ever cases and hospitals going into crisis standards.

    On Thursday, Faust shared an update: the circuit breaker dashboard, at that point, projected that hospitals in Arizona, California, Washington, and Wisconsin were approaching 100% capacity, if they weren’t at that point already. As of Saturday, California and Arizona are still projected to be at “at capacity,” according to the dashboard, while 14 other states ranging from Montana to South Carolina are “forecasted to exceed capacity” in coming days.

    var divElement = document.getElementById(‘viz1642354079303’); var vizElement = divElement.getElementsByTagName(‘object’)[0]; if ( divElement.offsetWidth > 800 ) { vizElement.style.minWidth=’1087px’;vizElement.style.maxWidth=’100%’;vizElement.style.minHeight=’1736px’;vizElement.style.maxHeight=(divElement.offsetWidth*0.75)+’px’;} else if ( divElement.offsetWidth > 500 ) { vizElement.style.minWidth=’1087px’;vizElement.style.maxWidth=’100%’;vizElement.style.minHeight=’1736px’;vizElement.style.maxHeight=(divElement.offsetWidth*0.75)+’px’;} else { vizElement.style.width=’100%’;vizElement.style.height=’3027px’;} var scriptElement = document.createElement(‘script’); scriptElement.src = ‘https://public.tableau.com/javascripts/api/viz_v1.js’; vizElement.parentNode.insertBefore(scriptElement, vizElement);

    From Faust’s descriptions and the accounts of healthcare workers he quotes, it’s also evident that determining between hospitalizations “with” COVID-19 and hospitalizations “from” COVID-19 is not a useful way to spend time and resources right now. Even if some of the COVID-19 patients currently in U.S. hospitals “happened to test positive” while seeking treatment for some other condition, these patients are still contributing to the intense pressure our healthcare system is under right now.

    Plus, as Ed Yong explains in a recent article in The Atlantic describing this false patient divide, COVID-19 can worsen other conditions that at first seem unrelated:

    The problem with splitting people into these two rough categories is that a lot of patients, including those with chronic illnesses, don’t fit neatly into either. COVID isn’t just a respiratory disease; it also affects other organ systems. It can make a weak heart beat erratically, turn a manageable case of diabetes into a severe one, or weaken a frail person to the point where they fall and break something. “If you’re on the margin of coming into the hospital, COVID tips you over,” Vineet Arora, a hospitalist at the University of Chicago Medicine, told me. In such cases, COVID might not be listed as a reason for admission, but the patient wouldn’t have been admitted were it not for COVID.

    In short: Omicron might be a milder variant at the individual level—thanks to a combination of the variant’s inherent biology and protection from vaccines and prior infections—but at a systemic level, it’s devastating. And rather than asking hospitals to split their patients into “with” versus “from” numbers, we should be giving them the staff, supplies, and other support they need to get through this crisis.

  • Sources and updates, October 17

    • COVID-19 cases, deaths, hospitalizations by vaccination status: The latest addition to the CDC’s COVID-19 dashboard, this week, is a set of two pages that break out case, death, and hospitalization rates by vaccination status. The page with case and death rates draws on CDC monitoring programs, and may not be entirely representative of data for the entire U.S. The page with hospitalization rates draws on COVID-NET, a network of over 250 hospitals in 14 states.
    • Hospitalization data will shift back to the CDC: Bloomberg reported this week that the Biden administration will bring the HHS Protect system, which tracks hospitalization data, under the auspices of the CDC. Hospitalization data moved from CDC responsibility to HHS responsibility in summer 2020—a move covered extensively by the COVID-19 Data Dispatch. At the time, this change drew criticism, though the HHS Protect system developed into a highly reliable data source. It is unclear how a move back to the CDC may impact hospitalization tracking.
    • Mask Diplomacy in Latin America During the COVID-19 Pandemic: This dataset, compiled by political scientists Diego Telias and Francisco Urdinez, includes over 500 donations of COVID-19 supplies—face masks, respirators, tests, and more. The data underlie a preprint posted online in August 2020 discussing China’s diplomacy in Latin America and the Caribbean. (h/t Data Is Plural.)

  • One data researcher’s journey through South Carolina’s COVID-19 reporting

    One data researcher’s journey through South Carolina’s COVID-19 reporting

    By Philip Nelson

    COVID-19 hospitalizations in South Carolina, as of August 26. Posted on Twitter by Philip Nelson.

    If you post in the COVID-19 data Twitter-sphere, you’re likely familiar with Philip Nelson, a computer science student at Winthrop University—and an expert in navigating and sharing data from the state of South Carolina. Philip posts regular South Carolina updates including the state’s case counts, hospitalizations, test positivity, and other major figures, and contributes to discussions about data analysis and accessibility.

    I invited Philip to contribute a post this week after reading his Tweets about his ongoing challenges in accessing his state’s hospitalization data. Basically, after Philip publicized a backend data service that enabled users to see daily COVID-19 patient numbers by individual South Carolina hospital, the state restricted this service’s use—essentially making the data impossible for outside researchers to analyze.

    To me, his story speaks to broader issues with state COVID-19 data, such as: agencies adding or removing data without explanation, a lack of clear data documentation, failure to advertise data sources to the public, and mismatches between state and federal data sources. These issues are, of course, tied to the systematic underfunding of state and local public health departments across the country, making them unequipped to respond to the pandemic.

    South Carolina seems to be particularly arduous to deal with, however, as Philip describes below.


    I’ve been collecting and visualizing South Carolina-related COVID-19 data since April 2020. I’m a computer science major at Winthrop University, so naturally I like to automate things, but collecting and aggregating data from constantly-changing data sources proved to be far more difficult than I anticipated.

    At the beginning of the pandemic, I had barely opened Excel and had never used the Python library pandas, but I knew how to program and I was interested in tracking COVID-19 data. So, in early March 2020, I watched very closely as the South Carolina Department of Health and Environmental Control (DHEC) reported new cases.

    During the early days of the pandemic, DHEC provided a single chart on their website with their numbers of negative and positive tests; I created a small spreadsheet tracking these cases. After a few days, DHEC transitioned to a dashboard that shared county level data.

    On March 23, I noticed an issue with the new dashboard. Apparently, someone had misconfigured authentication on something in the backend. (When data sources are put behind authentication, anyone outside of the organization providing that source loses access.) The issue was quickly fixed and I carried on with my manual entry, but this was not the last time I’d have to think about authentication.

    Initially, I manually entered the number of cases and deaths that DHEC reported. I thought I might be able to use the New York Times’ COVID-19 dataset, but after comparing it to the DHEC’s data, I decided that I’d have to continue my own manual entry.

    South Carolina’s REST API

    In August 2020, I encountered some other programmers on Twitter who had discovered a REST API on DHEC’s website. REST is a standard for APIs that make it easier for developers to use services on the web. In this case, I was able to make simple requests to the server and receive data as a response. After starting a database fundamentals course during the fall 2020 semester, I figured out how to query the service: I could use the data in the API to get cases and deaths for each county by day.

    This API gave me the ability to automate all of my update processes. By further exploring the ArcGIS REST API website, I realized that DHEC had other data services available. In addition to county-level data, the agency also provided an API for cases by ZIP code. I used these data to create custom zip code level graphs upon request, and another person I encountered built a ZIP code map of cases.

    During August 2020, the CDC stopped reporting hospitalization data and the federal government shifted to using data collected by the Department of Health and Human Services (HHS) and Teletracking. DHEC provided a geoservice for hospitalizations, based off of data provided to DHEC by Teletracking on behalf of the HHS. I did some exploration of the hospitalization REST API and found that the data in this API was facility-level (individual hospitals), updated daily. I aggregated the numbers in the API based on the report date in order to provide data for my hospitalization graph. At the time, I didn’t know that the federal government does not provide daily facility level data to the public.

    In October 2020, DHEC put their ZIP code-level API behind authentication. I voiced my displeasure publicly.  In late December 2020, DHEC put the API that contained county level cases and deaths behind authentication. At this point, I began to get frustrated with DHEC for putting things behind authentication without warning, but I kind-of gave up on getting the deaths data out of an API. Thankfully, DHEC still provided an API for confirmed cases, so I switched my scripts to scrape death data from PDFs provided by DHEC each day. I didn’t like using the PDFs because they did not capture deaths that were retroactively moved from one date to another, unlike the API.

    I ran my daily updates until early June 2021, when DHEC changed their reporting format to a weekday-only schedule.  I assumed that we’d seen the last wave of the pandemic and that, thanks to readily available vaccines, we had relegated the virus to a containable state. Unfortunately, that was not the case — and by mid-July, I had resumed my daily updates.

    Hospitalization data issues

    In August 2021, people in my Twitter circle became interested in pediatric data. I decided to return to exploring the hospitalization API because I knew it had pediatric-related attributes. It was during that exploration that I realized I had access to daily facility-level data that the federal government was not providing to the public; the federal government provides weekly facility-level data. My first reaction was to build a Tableau dashboard that let people look at the numbers of adults and pediatric patients with COVID19 at the facility level in South Carolina over time.

    After posting that dashboard on Twitter, I kept hearing that people wanted a replacement for DHEC’s hospitalization dashboard which, at the time, only updated on Tuesdays. So, I made a similar dashboard that provided more information and allowed users to filter down to specific days and individual hospitals, then I tweeted it at DHEC. Admittedly, this probably wasn’t the smartest move.

    I kept exploring the hospitalization data and found that it contained COVID-19-related emergency department visits by day, another data point provided weekly by HHS. After plotting out the total number of visits each day and reading the criteria for this data point, I decided I needed to make another dashboard for this. A day after I posted the dashboard to Twitter, DHEC put the API I was using behind authentication, again I tweeted my frustration

    A little while later, DHEC messaged me on Twitter and told me that they were doing repairs to the API. I was later informed that the API was no longer accessible, and that I would have to use DHEC’s dashboard or HHS data. The agency’s dashboard does not allow data downloads, making it difficult for programmers to use it as a source for original analysis and visualization.

    I asked for information on why the API was no longer operational; DHEC responded that they had overhauled their hospitalization dashboard, resulting in changes to how they ingest data from the federal government. This response did not make it clear why DHEC needed to put authentication on the daily facility-level hospitalization data.

    Meanwhile, DHEC’s hospital utilization dashboard has started updating daily again. But after examining several days’ worth of data, I cannot figure out how the numbers on DHEC’s dashboard correlate to HHS data. I’ve tried matching columns from a range dates to the data displayed, but haven’t been able to find a date where the numbers are equal. DHEC says the data is sourced from HHS’ TeleTracking system on their dashboard, but it’s not immediately clear to me why the numbers do not match. I’ve asked DHEC for an explanation, but haven’t received a response.

    Lack of transparency from DHEC

    I’ve recently started to get familiar with the process of using FOIA requests. In the past week, I got answers on requests that I submitted to DHEC for probable cases by county per day. This data is publicly accessible (but not downloadable) via a Tableau dashboard, but there is over 500 days’ worth of data for 46 counties. The data DHEC gave to me through the FOI process are heavily suppressed and, in my opinion, not usable.

    This has been quite a journey for me, especially in learning how to communicate and collect data. It’s also been a lesson in how government agencies don’t always do what we want them to with data. I’ve learned that sometimes government agencies don’t always explain (or publicize) the data they provide, and so the job of finding and understanding the data is left to the people who know how to pull the data from these sources.

    It’s also been eye-opening to understand that sometimes, I’m not going to be able to get answers on why a state-level agency is publishing data that doesn’t match a federal agency’s data. Most of all, it’s been a reminder that we always need to press government-operated public health agencies to be as transparent as possible with public health data.