Tag: race & ethnicity data

  • Fenceline communities left behind by data gaps: A dispatch from SEJ in Houston

    Fenceline communities left behind by data gaps: A dispatch from SEJ in Houston

    This week, I’m sharing a short dispatch from the Society of Environmental Journalists (SEJ) conference in Houston, Texas. Unlike other journalism conferences I’ve attended, SEJ meetings don’t just sequester you in your hotel all day: the organizers plan field trips that are designed to give reporters on-the-ground information about environmental issues at the place they’re visiting.

    I went on one of these trips, to the Houston Ship Channel and surrounding communities impacted by industrial pollution. For me, this experience was a lesson in the cascading health issues caused by environmental racism—including, of course, COVID-19—as well as the ways that data gaps can make it harder for hard-hit communities to get needed public health assistance.

    The Houston Ship Channel, I learned this week, is a passage for ships going between Houston’s port and the Gulf of Mexico. According to the Port Houston website, it’s the largest container port in the Gulf Coast, handling about two-thirds of all shipping containers that travel through the region. (Shipping containers include all the consumer products that we order online.)

    It is also the single largest U.S. port for petroleum exports. Every month, thousands of tons of oil and plastics (which are made from oil) pass through the Houston Ship Channel; much of this cargo is processed right on the banks of the channel, in massive refineries that define the landscape around Houston.

    With SEJ, I went on a boat tour through the Houston Ship Channel. We passed refineries and industrial plants from Valero, Chevron, Exxon-Mobil, and other major companies, getting a close look at just how much space these facilities take up and how they decimate the surrounding land.

    After the boat, my group went to Manchester, a neighborhood close to the channel in southeast Houston. Community activists from the local environmental advocacy group TEJAS explained that this neighborhood’s population is overwhelmingly Latino; many residents are low-income workers with no college degrees who speak Spanish as their first language.

    Manchester residents have faced intense pollution from industrial plants that border their homes, schools, and community spaces. We walked through a park that is surrounded on multiple sides by these plants; we could see smoke from chemicals burning, and smell the results of that burning in the air. Valero, which owns one of the nearby plants, had recently sponsored a playground in this park as a small gesture, barely acknowledging the harm it’s caused to this neighborhood.

    Of course, my immediate question was: what are the COVID-19 statistics for this neighborhood? To me, it seemed obvious that Manchester residents living with this intense pollution would face higher rates of respiratory conditions, cancers, and other diseases that would make them more vulnerable to severe COVID-19 symptoms. (Poor quality air has been linked with more severe COVID-19 outcomes since the early days of the pandemic.)

    Here’s the problem: nobody could actually answer my question. I spoke to Leticia Ablaza, government relations director at Air Alliance Houston and another speaker on the tour, who explained that the link between pollution and COVID-19 in Manchester and other similar Houston neighborhoods has yet to be studied. Anecdotally, she said, she knows community members with respiratory conditions who have faced heightened vulnerability to COVID-19. But there’s no formal data.

    The reason for this lack of formal studies became clear to me later, when I attended a conference session on the links between COVID-19 and environmental health. Annie Xu, a Rice University student who has studied health disparities in Texas, said at this session that the state of Texas does not publish any COVID-19 data below the county level.

    Xu’s research group did identify links between Texas counties’ racial demographics and their COVID-19 burden, published in Nature Scientific Reports in January. But when the group looked for links between air pollution and COVID-19, the analysis didn’t lead to significant results.

    This finding is likely because pollution can vary widely within Texas counties, Xu said. For example, there’s a huge gap between air quality in Manchester and on Rice’s campus, both of which are included in Harris County. To truly find a connection between pollution and COVID-19, a research group like hers would require more granular data, such as at the ZIP code or census tract level.

    But the Texas public health department only publishes COVID-19 data at the county level—with the exception of vaccinations, one metric that is available by ZIP code. The federal government doesn’t report COVID-19 data below the county level either.

    Without this granular information, it’s difficult to demonstrate the impacts of petrochemical pollution on COVID-19 in neighborhoods like Manchester. The community isn’t able to get priority status for public health interventions like vaccines or testing—meaning that its vulnerabilities are unlikely to change.

    As longtime readers know, I have spent a lot of time grappling with COVID-19’s demographic disparities. I was a leading volunteer for the COVID Tracking Project’s COVID Racial Data Tracker, and have sought to call attention to the terrible state of this type of COVID-19 data in the U.S. whenever I can. Still, it was a new experience to actually see a community left behind by the data gaps that I cover.

    What kind of investment would be required to truly study how COVID-19 has impacted a place like Manchester, in Houston? And what other environment-related health conditions do we need to be investigating in these areas? I hope that future stories will enable me to answer these questions.

    For now, if you have any questions, comments, or data source recommendations in this area, please reach out!

  • All the U.S.’s COVID-19 metrics are flawed

    All the U.S.’s COVID-19 metrics are flawed

    This week, I had a big retrospective story published at FiveThirtyEight: I looked back at the major metrics that the U.S. has used to track COVID-19 over the past two years—and how our country’s fractured public health system hindered our use of each one.

    The story is split into seven sections, which I will briefly summarize here:

    • Case counts, January to March 2020: Early on in the pandemic, the U.S. had a very limited picture of COVID-19 cases due to our very limited testing: after rejecting a test made by the WHO, the CDC made its own test—which turned out to have contamination issues, further slowing down U.S. testing. In early March 2020, for example, the majority of cases in NYC were identified in hospitals, suggesting that official counts greatly underestimated the actual numbers of people infected.
    • Tests administered, March to September 2020: Test availability improved after the first wave of cases, with organizations like the COVID Tracking Project keeping a close eye on the numbers. But there were a lot of challenges with the testing data (like different units across different states) and access issues for Americans with lower socioeconomic status.
    • Hospitalizations, October to December 2020: By late 2020, many researchers and journalists were considering hospitalizations to be a more reliable COVID-19 metric than cases. But it took a long time for hospitalization data to become reliable on a national scale, as the HHS launched a new tracking system in the summer and then took months to work out kinks in this system.
    • Vaccinations, January to June 2021: When the vaccination campaign started in late 2020, it was “tempting to forget about all other COVID-19 metrics,” I wrote in the story. But the U.S.’s fractured system for tracking vaccinations made it difficult to analyze how close different parts of the country were to prospective “herd immunity,” and distracted from other public health interventions that we still needed even as people got vaccinated.
    • Breakthrough cases, July to November 2021: The Delta surge caused widespread infections in people who had been vaccinated, but the CDC—along with many state public health agencies—was not properly equipped to track these breakthrough cases. This challenge contributed to a lack of good U.S. data on vaccine effectiveness, which in turn contributed to confusion around the need for booster shots.
    • Hospitalizations (again), December to January 2022: The Omicron surge introduced a need for more nuance in hospitalization data, as many experts asked whether COVID-19 patients admitted with Omicron were actually hospitalized for their COVID-19 symptoms or for other reasons. Nuanced data can be useful in analyzing a variant’s severity; but all COVID-related hospitalizations cause strain on the healthcare system regardless of their cause.
    • New kinds of data going forward: In our post-Omicron world, a lot of public health agencies are shifting their data strategies to treat COVID-19 more like the flu: less tracking of individual cases, and more reliance on hospitalization data, along with newer sources like wastewater. At this point in the pandemic, we should be fortifying data systems “for future preparedness,” I wrote, rather than letting the systems we built up during the pandemic fall to the wayside.

    I did a lot of reporting for this piece, including interviews with some of the U.S.’s foremost COVID-19 data experts and communicators. As long as the piece is, there were a lot of metrics (and issues with these metrics) that came up in these interviews that I wasn’t able to include in the final story—so I wanted to share some bonus material from my reporting here.

    Long COVID:

    As I’ve discussed in previous issues, the U.S. has done a terrible job of collecting data on Long COVID. The NIH estimates that this condition follows a significant share of coronavirus infections (between 10% and 30%), but we have limited information on its true prevalence, risk factors, and strategies for recovery.

    Here’s Dr. Eric Topol, the prolific COVID-19 commentator and director of the Scripps Research Translational Institute, discussing this data problem:

    [Long COVID has] been given very low priority, very little awareness and recognition. And we have very little data to show for it, because it hasn’t been taken seriously. But it’s a very serious matter.

    We should have, early on, gotten at least a registry of people —a large sample, hundreds of thousands of people prospectively assessed, like is being done elsewhere [in the U.K. and other countries]. So that we could learn from them: how long the symptoms lasted, what are the symptoms, what are the triggers, what can be done to avoid it, the role of vaccines, the role of boosters, all this stuff. But we have nothing like that.

    The NIH’s RECOVER initiative may answer some of these questions, but it will take months—if not years—for the U.S. to actually collect the comprehensive data on Long COVID that we should have started gathering when the condition first began gaining attention in 2020.

    Demographic data:

    In the testing section of the story, I mention that the U.S. doesn’t provide much demographic data describing who’s getting tested for COVID-19. There is actually a little-known provision in the CARES Act that requires COVID-19 testing providers to collect certain demographic data from all people who seek tests. But the provision is not enforced, and any data that are collected on this subject aren’t making it to most state COVID-19 dashboards, much less to the CDC’s public data dashboard.

    Here’s Dr. Ellie Murray, an epidemiologist at the Boston University School of Public Health, discussing why this is an issue:

    We don’t collect reason for seeking a test. We don’t collect age, race, ethnicity, occupation of people who seek a test. Those kinds of things could provide us with some really valuable information about who is getting tested, when, and why—that could help us figure out, what are the essential occupations where people are having a lot of exposures and therefore needing to get a lot of tests? Or are there occupations where we’re seeing a lot of people end up in hospital, who have those occupations, but they’re not getting tests, because actually, the test sites are nowhere near where they need to work, or they don’t have the time to get there before they close.

    And so we don’t really know who is getting tested, and that, I think, is a bigger problem, than whether the numbers that are being tested tell us anything about the trajectory of COVID. Because we have case data, and hospitalization data, and death data to tell us about the trajectory. And the testing could really tell us more about exposure, and concern, and access—if we collected some more of this data around who is getting tested and why.

    Test positivity:

    Speaking of testing: another metric that I didn’t get into much in the story was test positivity. Test positivity—or, the share of COVID-19 tests that return a positive result—has been used from the CDC to local school districts as a key metric to determine safety levels. (For more on this metric, check out my FAQ post from this past January.)

    But even when it’s calculated correctly, test positivity faces the same challenges as case data: namely, bias in who’s getting tested. Here’s Lauren Ancel Meyers, director of the University of Texas at Austin’s COVID-19 Modeling Consortium, explaining this:

    Test positivity is just as fraught [as cases]. It’s just as difficult, because you need to know the numerator and the denominator—what’s influencing the numerator and the denominator? Who is going to get tested, who has access to tests? … It used to be, at the very beginning [of the pandemic], nobody could get a test who wanted a test. And now, today, everybody has a test in their medicine cabinet, and they don’t get reported when they test. It’s different issues that have ebbed and flowed throughout this period.

    Often, if you’re a good data analyst or a modeler, and you have all the information, you can handle those kinds of biases. But the problem is, we don’t know the biases from day to day. And so even though there are statistical tools to deal with incomplete bias, without knowing what those biases are, it’s very hard to do reliable inference, and really hard to understand what’s actually going on.

    Genetic surveillance:

    Also related to testing: genetic surveillance for coronavirus variants of concern. Genetic surveillance is important because it can help identify new variants that may be more transmissible or more likely to evade protection from vaccines. It can additionally help track the qualities of concerning variants once they are identified (if variant data is linked to hospitalization data, vaccination data, and other metrics—which is not really happening in the U.S. right now.)

    Our current genetic surveillance systems have a lot of gaps. Here’s Leo Wolansky, from the Rockefeller Foundation’s Pandemic Prevention Institute (PPI), discussing how his organization seeks to address these challenges:

    [We’re trying to understand] where our blind spots are, and the bias that we might experience with a lot of health system reporting. One of the things that PPI has been doing is identifying centers of excellence in different parts of the world that can improve the sequencing of new cases in underrepresented countries. And so for example, we’ve provided quite a bit of support to the folks in South Africa that ultimately rang the alarm on Omicron.

    We’re also doing this by actually trying to systematically assess countries’ capacity for this type of genomic surveillance. So thinking about, how many tests have been recorded? What’s that test positivity rate? Do we have confidence in the basic surveillance system of the country? And then, do we also see enough sequences, as well as sequencing facility data, to demonstrate that this country can sequence and just isn’t doing enough—or cannot sequence because it needs foundational investment in things like laboratories and devices. We’ve been mapping this capacity just to make sure that we understand where we should be investing as a global community.

    The Pandemic Prevention Institute is taking a global perspective in thinking about data gaps. But these gaps also exist within the U.S., as is clear when one looks at the differences in published coronavirus sequences from state to state. Some states, like Wyoming, Vermont, and Colorado, have sequenced more than 10% of their cumulative cases, according to the CDC. Others, like Oklahoma, Iowa, and South Dakota, have sequenced fewer than 3%. These states need additional investment in order to thoroughly monitor coronavirus transmission among their residents.

    Cohort studies:

    In a cohort study, researchers follow a group of patients over time in order to collect long-term data on specific health conditions and/or the outside factors that influence them. The U.S. has set up a few cohort studies for COVID-19, but they haven’t been designed or utilized in a way that has actually provided much useful data—unlike cohort studies in some other countries. (The U.K., for example, has several ongoing cohort studies collecting information on COVID-19 symptoms, infections in schools, seroprevalence, and more.)

    Here’s Dr. Ellie Murray explaining the lost potential of these studies in the U.S.:

    There are a number of existing cohort studies that have been asked or who asked to pivot to collecting COVID information and therefore collecting long-term COVID information on their cohorts. But there doesn’t seem to be any kind of system to [determine], what are the questions we need answered about COVID from these kinds of studies? And how do we link up people who can answer those questions with the data that we’re collecting here, and making sure we’re collecting the right data? And if this study is going to answer these questions, and this one is going to answer those questions—or, here’s how we standardize those two cohorts so that we can pull them together into one big COVID cohort.

    And so, we end up in this situation where, we don’t know what percent of people get Long COVID, even though we’ve been doing this for over two years. We don’t even really know, what are all the different symptoms that you can get from COVID? … There are all these questions that we could be sort-of systematically working our way through, getting answers and using them to inform our planning and our response. [In addition to having] standardized questions, you also need a centralized question, instead of just whatever question occurs to someone who happens to have the funding to do it.

    Excess deaths:

    Excess deaths measure the deaths that occur in a certain region, over a certain period of time, above the number of deaths that researchers expect to see in that region and time period based on modeling from past years’ data. Excess deaths are the COVID-19 metric with the longest lag time: it takes weeks from initial infection for someone to die of the disease, and can take weeks further for a death certificate to be incorporated into the public health system.

    Once that death information is available, however, it can be used to show the true toll of the pandemic—analyzing not just direct COVID-19 deaths, but also those related to isolation, financial burden, and other indirect issues—as well as who has been hit the hardest.

    Here’s Cecile Viboud, a staff scientist at the NIH who studies infectious disease mortality, discussing this metric:

    We’ve been using the excess death approach for a long time. It comes from flu research, basically starting in 1875 in the U.K. And it was used quite a lot during the 1918 pandemic. It can be especially good in examining historical records where you don’t have lab confirmation—there was no testing ability back in those days…

    So, I think it’s kind of natural to use it for a pandemic like COVID-19. Very early on, you could see how useful this method was, because there was so little testing done. In March and April 2020, you see substantial excess, even when you don’t see lab-confirmed deaths. There’s a disconnect there between the official stats, and then the excess mortality… [We can also study] the direct effect of COVID-19 versus the indirect effect of the pandemic, like how much interventions affected suicide, opioids, death, accidents, etc. The excess approach is also a good method to look at that.

    Viboud also noted that excess deaths can be useful to compare different parts of the U.S. based on their COVID-19 safety measures. For example, one can analyze excess deaths in counties with low vaccination rates compared to those with high vaccination rates. This approach can identify the pandemic’s impact even when official death counts are low—an issue that the Documenting COVID-19 project has covered in-depth.

    Again, you can read the full FiveThirtyEight story here!

    More federal data

  • COVID source callout: Still no state-by-state data on vaccinations by race/ethnicity

    COVID source callout: Still no state-by-state data on vaccinations by race/ethnicity

    This week, the CDC added a new feature to the vaccination section of its COVID-19 dashboard: you can now look at demographic vaccination trends at the state level, not just nationally and regionally.

    But there’s a catch: the state-by-state demographic trends only include age and sex data. Vaccination trends by race and ethnicity are still only available at the national level; in fact, when you click on “Race/Ethnicity” on the booster shots section of this dashboard, the CDC directs you to “please visit the relevant health department website” for more local data.

    For state-level race and ethnicity data, the CDC directs users to state public health agencies. Screenshot taken on March 20.

    It is now over a year into the U.S.’s vaccine rollout, and the CDC is still failing to publicly share data on vaccinations by state and race/ethnicity. I actually wrote a callout post about this in March 2021, and nothing has changed since then!

    This is a major issue because such data are needed to examine equity in the vaccine rollout. While it’s possible to compile data from the states that report vaccinations by race and ethnicity themselves, major inconsistencies in state reporting practices make these data hard to standardize. Why isn’t the CDC doing this? Or, if the CDC is doing this, why aren’t the data public?

  • Sources and updates, March 20

    Data sources and data-related updates for this week:

    • APM Research Lab relaunches Color of Coronavirus tracker: From April 2020 to March 2021, the American Public Media (APM) Research Lab compiled state-level data on COVID-19 deaths by race and ethnicity, in order to present a picture of which U.S. populations were most hard-hit by the pandemic. The project relaunched this week, now utilizing CDC mortality statistics instead of compiling data from states. One major finding from the updated data: “Indigenous Americans have the highest crude COVID-19 mortality rates nationwide—about 2.8 times as high as the rate for Asians, who have the lowest crude rates.”
    • CDC might take back hospital data reporting responsibilities from HHS: As longtime readers may remember, back in summer 2020, the Department of Health and Human Services (HHS) developed a new data system for hospitals to report COVID-19 patient numbers and other related metrics. At the time, the HHS was taking over responsibility for these data from the CDC; this inspired some political posturing and concerns about data quality, though the eventual HHS dataset turned out to be very comprehensive and useful. (This original data switch was the subject of my very first CDD issue, and I followed the HHS data system closely throughout 2020.) Now, Bloomberg reports, the CDC wants to take back hospital data reporting from the HHS. More political posturing and data quality concerns are, it seems, inevitable—this time tied to the CDC’s challenges in modernizing its data systems.
    • Hospitalizations among young children, by race/ethnicity during Omicron surge: Two MMWR studies that caught my attention this week: one examined hospitalization rates among young children, ages 0 to 4, between March 2020 and February 2022. This study found that COVID-19 hospitalization rates among children in this age range were five times higher at the peak of the Omicron surge compared to the Delta surge. The second report examined hospitalizations by race and ethnicity, finding that, during Omicron’s peak, hospitalization rates among Black adults were nearly four times higher than rates among white adults. Both reports clearly demonstrate who is still vulnerable to COVID-19 as the U.S. abandons safety measures.
    • Pfizer and Moderna both seeking EUAs for additional booster shots: POLITICO reported this week that first Pfizer, then Moderna have requested Emergency Use Authorization for fourth doses of their COVID-19 vaccines. Pfizer’s request is specifically for people age 65 and over, while Moderna’s is for all adults. Notably, Pfizer’s request is based on data from Israel suggesting that immunity from an initial booster wanes after several months—just as Pfizer’s initial case for boosters in the fall was also based on Israeli data.
    • Global COVID-related deaths may be three times higher than official records: Throughout the pandemic, researchers have used excess mortality (i.e. the deaths occurring in a given region and time period above what’s expected) to determine the true toll of COVID-19. A new study, published this week in The Lancet, took this approach for 191 countries and territories from January 2020 to December 2021. The researchers estimate that about 18 million people died worldwide due to the pandemic—including not just direct COVID-19 deaths but also others caused by COVID-related disruptions. That’s three times higher than the 6 million COVID-19 deaths that have been officially reported in this time period.

  • Pandemic preparedness: Improving our data surveillance and communication

    Pandemic preparedness: Improving our data surveillance and communication

    Screenshot of the new Biden COVID-19 plan.

    As COVID-19 safety measures are lifted and agencies move to an endemic view of the virus, I’m thinking about my shifting role as a COVID-19 reporter. To me, this beat is becoming less about reporting on specific hotspots or control measures and more about preparedness: what the U.S. learned from the last two years, and what lessons we can take forward—not just for the future COVID-19 surges that are almost certainly coming, but also for future infectious disease outbreaks.

    To that end, I was glad to see the Biden administration release a new COVID-19 plan focused on exactly this topic: preparedness for new surges, new variants, and new infectious diseases beyond this current pandemic.

    From the plan’s executive summary:

    Make no mistake, President Biden will not accept just “living with COVID” any more than we accept “living with” cancer, Alzheimer’s, or AIDS. We will continue our work to stop the spread of the virus, blunt its impact on those who get infected, and deploy new treatments to dramatically reduce the occurrence of severe COVID-19 disease and deaths.

    The Biden plan was released last week, in time with the president’s State of the Union address. I read through it this morning, looking for goals and actions connected to data collection and reporting.

    Here are a few items that stuck out to me, either things that the Biden administration is already doing or should be doing: 

    • Improving surveillance to identify new variants: The U.S. significantly improved its variant sequencing capacity in 2021, multiplying the number of cases sequenced by more than tenfold from the beginning to the end of the year. But the new Biden plan promises to take these improvements further, by adding more capacity for sequencing at state and local levels—and, crucially, “strengthening data infrastructure and interoperability so that more jurisdictions can link case surveillance and hospital data to vaccine data.” In plain language, that means: making it easier to track breakthrough cases (which I have argued is a key data problem in the U.S.).
    • Expanding wastewater surveillance: As I’ve written before, in the current national wastewater surveillance network, some states are very well-represented with over 50 collection sites; while other states are not included in the data at all. The Biden administration is committed to bring more local health agencies and research institutions into the surveillance network, thus expanding our national capacity to get early warnings about surges.
    • Standardizing state and local data systems: I’ve written numerous times that the U.S. suffers from a lack of standardization among its 50 different states and hundreds of local health agencies. According to the new plan, the Biden administration plans to facilitate data sharing, aggregating, and analyzing data across state and local agencies—including wastewater monitoring and other potential methods of surveillance that would provide early warnings of new surges. This would be huge if it actually happens.
    • Modernize the public health data infrastructure: One thing that could help health agencies better coordinate and share data: modernizing their data systems. That means phasing out fax machines and mail-in reports (which, yes, some health departments still use) and investing in new electronic health record technologies, while hiring public health workers who can manage such systems.
    • Use a new variant playbook to evaluate new virus strains: Also in the realm of variant preparedness, the Biden administration has developed a new “COVID-19 Variant Playbook” that may be used to quickly determine how a new variant impacts disease severity, transmissibility, vaccine effectiveness, and other factors. The new playbook may be used to quickly update vaccines, tests, and treatments if needed, by working in partnership with health systems and research institutions.
    • Collecting demographic data on vaccinations and treatments: The Biden plan boasts that, “Hispanic, Black, and Asian adults are now vaccinated at the same rates as White adults.” However, CDC data shows that this trend does not hold true for booster shots: eligible white Americans are more likely to be boosted than those in other racial and ethnic groups. The administration will need to continue collecting demographic data to identify and address gaps among vaccinations and treatments; indeed, the Biden plan discusses continued efforts to improve health equity data.
    • Tracking health outcomes for people in high-risk settings: Along with its health equity focus, the Biden plan discusses a need to better track and report on health outcomes in nursing homes, other long-term care facilities, and other congregate settings like correctional facilities and homeless shelters. Congregate facilities continue to be major COVID-19 hotspots whenever there’s a new outbreak, so improving health standards in these settings should be a major priority.
    • Studying and combatting vaccine misinformation, vaccine safety: The new plan acknowledges the impact of misinformation on vaccine uptake in the U.S., and commits the Biden administration to addressing this trend. This includes a Request for Information that will be issued by the Surgeon General’s office, asking researchers to share their work on misinformation. Meanwhile, the administration will also continue monitoring vaccine safety and reporting these data to the public.
    • Test to Treat: One widely publicized aspect of the Biden plan is an initiative called “Test to Treat,” which would allow people to get tested for COVID-19 at pharmacies, health clinics, long-term care facilities, and other locations—then, if they test positive, immediately receive treatment in the form of antiviral pills. If this initiative is widely funded and adopted, the Biden administration should require all participating health providers to share testing and treatment data. This would allow researchers to evaluate whether this testing and treatment rollout has been equitable across different parts of the country and minority groups.
    • Website for community risk levels and public health guidance: The Biden plan includes the launch of a government website “that allows Americans to easily find public health guidance based on the COVID-19 risk in their local area and access tools to protect themselves.” The CDC COVID-19 dashboard was recently redesigned to highlight the agency’s new Community Level guidance, which is likely connected to this goal. Still, the CDC dashboard leaves much to be desired when it comes to comprehensive information and accessibility, compared to other trackers.
    • A new logistics and operational hub at HHS: In the last two years, the Department of Health and Human Services (HHS) built up an office for coordinating the development, production, and delivery of COVID-19 vaccines and treatments. The new Biden plan announced that this office will become a permanent part of the agency, and may be used for future disease outbreaks. At the same time, the Biden administration has added at-home tests, antiviral pills, and masks to America’s national stockpile for future surges; and it is supporting investments in laboratory capacity for PCR testing.
    • Tracking Long COVID: Biden’s plan also highlights Long COVID, promoting the need for government efforts to “detect, prevent, and treat” this prolonged condition. The plan mentions NIH’s RECOVER initiative to study Long COVID, discusses funding new care centers for patients, and proposes a new National Research Action Plan on Long COVID that will bring together the HHS, VA, Department of Defense, and other agencies. Still, the plan doesn’t discuss actual, financial support for patients who have been out of work for up to two years.
    • Supporting health and well-being among healthcare workers: The new Biden plan acknowledges major burnout among healthcare workers, and proposes a new grant program to fund mental health resources, support groups, and other systems of combatting this issue. Surveying healthcare workers and developing systematic solutions to the challenges they face could be a major aspect of preparing for future disease outbreaks. The Biden plan also mentions investing in recruitment and pipeline programs to support diversity, equity, and inclusion among health workers.
    • More international collaboration: The new Biden plan also focuses on international aid—delivering vaccine donations to low-income nations—and collaboration—improving communication with the WHO and other global organizations that conduct disease surveillance. This improved communication may be especially key for identifying and studying new variants in a global pandemic surveillance system.

    This week, a group of experts—including some who have advised the Biden administration— followed up on the Biden plan with their own plan, called “A Roadmap for Living with COVID.” The Roadmap plan also emphasizes data collection and reporting, with a whole section on health data infrastructure; here, the authors emphasize establishing centralized public health data platforms, linking disparate data types, designing data infrastructure with a focus on health equity, and improving public access to data.

    Both the Biden administration’s plan and the Roadmap plan give me hope that U.S. experts and leaders are thinking seriously about preparedness. However, simply releasing a plan is only the first step to making meaningful changes in the U.S. healthcare system. Many aspects of the Biden plan involve funding from Congress… and Congress is pretty unwilling to invest in COVID-19 preparedness right now. Just this week, a $15 billion funding plan collapsed in the legislature after the Biden administration already made major concessions.

    Readers, I recommend calling your Congressional representatives and urging them to support COVID-19 preparedness funding. You can also look into similar measures in your state, city, or other locality. We need to improve our data in order to be prepared for future disease outbreaks, COVID-19 and beyond.

    More national data

  • As COVID-19 precautions are lifted, who remains vulnerable?

    As COVID-19 precautions are lifted, who remains vulnerable?

    Hispanic, Black, and Native Americans are less likely to have received their booster shots than white Americans, according to CDC data.

    As more states and other institutions lift COVID-19 safety measures, the shift has sparked a conversation about who remains most vulnerable to COVID-19 during this period. I wanted to highlight a few of these vulnerable groups:

    • Seniors who remain unvaccinated or unboosted: “No other basic fact of life matters as dramatically as age for COVID,” writes Sarah Zhang in The Atlantic this week. Zhang’s story argues that the U.S. has not actually pushed to vaccinate elderly Americans with the same focus that other wealthy nations have. More than 10% of Americans over age 65 are not fully vaccinated and about one-third of those seniors who are fully vaccinated have not received their booster shots, according to CDC data. These seniors face higher COVID-19 risk than younger adults who are entirely unvaccinated, Zhang writes.
    • People of color who remain unvaccinated or unboosted: Zhang’s article inspired me to also look at recent vaccination trends by race and ethnicity. Black, Hispanic, and Native Americans have been at higher risk for COVID-19 throughout the pandemic, as their minority identities often coincide with lower socioeconomic status. According to CDC data, booster shot trends are similar to the vaccination trends we saw in early 2021: while 55% of eligible white Americans have received their booster shots, that number is below 50% for Black, Hispanic, and Native Americans. It’s lowest for Hispanic or Latino Americans: only 39% of those eligible have received a booster shot, as of February 19.
    • Immunocompromised people: If you haven’t yet read Ed Yong’s latest feature, about how America’s pandemic response has left immunocompromised people behind, drop everything and read it today. About 3% of U.S. adults take immunosuppressive drugs, while others live with diseases like AIDS that impact their immune systems. “In the past, immunocompromised people lived with their higher risk of infection, but COVID represents a new threat that, for many, has further jeopardized their ability to be part of the world,” Yong writes. Several other articles this week have also highlighted the challenges immunocompromised Americans face at this point in the pandemic.
    • Pregnant people: According to CDC data, about 68% of pregnant people ages 18 to 49 are fully vaccinated, as of February 12. That leaves almost one-third of pregnant Americans who are not fully vaccinated. Studies have found that pregnant people infected with the coronavirus are at higher risk for complications during their pregnancies and other severe outcomes. Plus, a new CDC study released this week found that a parent’s vaccination while pregnant greatly reduces an infant’s risk of being hospitalized for COVID-19, as antibodies produced by vaccination may be transferred from parent to child.
    • Children under age five: Of course, I have to mention the one group of Americans that is still not yet eligible for vaccination: children under age five. As parents of these kids have dealt with a confusing back-and-forth from Pfizer and the FDA on when vaccines might be available, many are facing high stress levels and remaining cautious even while schools and other institutions reduce safety measures.

    More vaccination data

  • New CDC mortality data: “Real-time public health surveillance at a highly granular level”

    New CDC mortality data: “Real-time public health surveillance at a highly granular level”

    The CDC’s new data release allows researchers to search through mortality data from 2020 and 2021 in great detail. Screenshot of the CDC’s search tool retrieved December 12.

    This past Monday, the CDC put out a major data release: mortality data for 2020 and 2021, encompassing the pandemic’s impact on deaths from all causes in the U.S.

    The new data allow researchers and reporters to investigate excess deaths, a measure of the pandemic’s true toll—comparing the number of deaths that occurred in a particular region, during a particular year, to deaths that would’ve been expected had COVID-19 not occurred. At the same time, the new data allow for investigations into COVID-19 disparities and increased deaths of non-COVID causes during the pandemic.

    To give you a sense of the scale here: As of Saturday, the U.S. has reported almost 800,000 COVID-19 deaths. But experts say the true COVID-19 death toll may be 20% higher, meaning that one million Americans have died from the virus. And that’s not counting deaths tied to isolation, drug overdoses, missed healthcare, and other pandemic-related causes.

    The CDC’s new data release is unique because, in a typical year, the CDC reports mortality data with a huge lag. Deaths from 2019 were reported in early 2021, for example. But now, the CDC has adapted its reporting system to provide the same level of detail that we’d typically get with that huge lag—now with a lag of just a few weeks. The CDC has also improved its WONDER query system, allowing researchers to search the data with more detail than before.

    “I would describe this new release as more real-time surveillance at more specific detail than any journalists, or epidemiologists, or any other kind of researcher even knows what to do with,” said Dillon Bergin, an investigative reporter and my colleague at the Documenting COVID-19 project, at the Brown Institute for Media Innovation and MuckRock.

    Along with Dillon and other Documenting COVID-19 reporters, I worked on a story explaining why these CDC data are such a big deal—along with what we’re seeing in the numbers so far. The story was published this week at USA Today and at MuckRock. Our team also compiled a data repository with state-level information from the new CDC release, combined with death data from 2019 and excess deaths.

    If you’re a reporter who’d like to learn more about the new CDC data, you can sign up for a webinar with the Documenting COVID-19 team—taking place next Wednesday, December 15, at 12 PM Eastern time. It’s free and will go for about an hour, with lots of time for questions. Sign up here!

    Editor’s note, December 27: This webinar was recorded; you can watch the recording here.

    Also, as our initial story is part of a larger investigation (in collaboration with USA Today), the team has put together a callout form for people to share their stories around COVID-19 deaths in their communities. If you have a story to share, you can fill out the form here.

    To provide some more information on why this new CDC release is so exciting—and what you can do with the data—I asked Dillon a few questions about it. As the lead reporter on our team’s excess deaths investigation, he’s spent more time with these data than anyone else. This interview has been lightly edited and condensed for clarity.


    Betsy Ladyzhets: How would you summarize this new release? What is it?

    Dillon Bergin: I would describe this new release as more real-time surveillance at more specific detail than any journalists, or epidemiologists, or any other kind of researcher even knows what to do with. It’s unfathomably detailed, and the fact that we’re going to be able to see updates in almost real time is really critical at this stage of the pandemic, or at any stage in a public health crisis. I think it’s a huge, huge step forward.

    BL: Specifically in the realm of COVID deaths, but also, all deaths during the pandemic.

    DB: Exactly, yes. In the realm of COVID deaths, we do know that there is a large gap between the total amount of excess deaths and the excess deaths that COVID accounts for. So it’s interesting from that angle, understanding what COVID might have been misclassified. But the data can also be used for a broad range of other types of deaths that have happened during the pandemic or possibly increased during the pandemic.

    BL: So why are researchers excited about this data release?

    DB: Previously, for something to go up on the WONDER website, or to become WONDER data, has to be finalized in the year after. So, data from 2020 would just be finalized now. Typically, we might not see that data until, probably, early in the new year [2022].

    But with the new tool, we’re getting that 2020 and 2021 WONDER data now. And the CDC does a great job of providing a lot of granular details about causes of death, and racial demographics… Those are things that general CDC [mortality] data gives you, but the WONDER data is even more detailed. So, the fact that researchers don’t have to wait anymore for that data to be finalized, that the CDC is providing provisional data at such a detailed level—that’s what researchers are excited about.

    BL: It’s the provisional data that’s being released, like, a year earlier than you would normally expect it to be published, right?

    DB: Yeah, a year earlier than you would expect it to be published. Which means it’s almost real-time, because it has, I think, a three- or four- week lag. This data is real-time public health surveillance at a highly granular level—which is what people have been asking for. It’s what epidemiologists have been asking for, researchers, advocates of all kinds, journalists, lots of people have been saying, “We need this type of surveillance.”

    BL: When you say a three- or four-week lag—the CDC is going to update it every couple of weeks, right?

    DB: Yes, that’s correct.

    BL: Do you have a sense of what the update schedule is going to be, or is the CDC not sure yet?

    DB: I’m not sure. I know it was a big haul for them to just get this out, I’m not sure what the next update will be…

    BL: Yeah, well, I’m sure we [Documenting COVID-19] will keep an eye on it. And we’ll tell everybody when it updates. (Editor’s note: As of December 12, it has already been updated! Data now go through November 20, 2021.) So, what are some of the things that you’ve seen in the data from the preliminary analysis that you’ve done so far?

    DB: One of the specific things that I’ve seen, that’s been really important for the work that I’m doing right now, is increases of different types of deaths at home. When people die, they don’t always die in a hospital—they could die in an outpatient clinic, or in an ER, or they could come to the hospital dead on arrival, they could die in hospice, or a nursing home, or at home.

    And one of the awesome things about the CDC data is that you can see, actually, where people have died, and what specific causes of death that those people had when they died. Or, to be precise, you can’t see specific people—but you can see, say, 50 people died of heart attacks in a specific county at home. You would be able to see [in the data] that those people not only died of a heart attack, but they died at home. 

    The takeaway for me has been that respiratory and cardiovascular deaths have increased at home in specific states and counties. Louisiana is one example: it looks like Louisiana has the highest increase of deaths at home from [the CDC designation] “other forms of heart disease,” of any state, at like a 60% increase from previous years. So then we have to ask ourselves, what could lead to that increase? Are people really dying more of heart disease at home, by that much higher of a rate? Or is something else going on here?

    BL: If you were talking to local reporters about this, what would they recommend that they do with the data?

    DB: I would recommend that they take a look at the most recent data, the data from 2020 and 2021, for their area. And also pull some previous years, probably five years [of data], and start looking at causes of death, ages of the people who died, racial and demographic makeup, and place of death. I think different combinations of those data will start to provide some interesting avenues that can lead you to do actual human reporting—asking, what was happening? And why was that happening at this scale?

    The new WONDER data, you can kind-of stretch it and bend it in so many different ways, it can be a little bit intimidating at first. So maybe, it would also be useful to start with a more specific question. If you’re wondering about, let’s say, certain types of deaths in a very specific county. Say you’re wondering if that’s from unintentional drug overdoses, or deaths from respiratory diseases in your county. Then you can start looking at the more granular level of details within those types of deaths—whether it’s racial and demographic makeup, or whether or not the body was autopsied. You can even see the day of the week [that people died]. There’s a lot of different places you can zoom in.

    My overall advice would be: Start with a general question and then explore, then reform that question and explore, then reform that question. The data is both so extensive and so granular that you can get lost in it very quickly.

    BL: You mentioned that it’s very intimidating, which I would second. The first time I looked at the WONDER data, I was like, “What is going on here?” So, what would be your recommendations for working with that data tool? Or any major caveats that you think people should know before they dive into this?

    DB: That’s a great question, because with WONDER, you have to use their querying tool through their website. You can’t really easily and quickly export things or work with an API, though you can export data once you do a query.

    My first caveat would be, keep in mind the suppression of any values under 10. So, that means you can zoom in on certain things, but then you may also have to zoom out. For example, if you wanted to know the leading causes of death for someone, when a body is dead on arrival—if you do that search at a state level, you’ll probably be able to see the first five or so causes before you reach causes that have only happened between one and 10 times, and then that value is oppressed and you can’t see the information. But if you were to do the same search on a national level, you would have a lot more causes for those types of deaths.

    So, I would keep in mind the suppression, when zooming in and out. And also keep in mind, if, say, you’re looking at “dead on arrival” deaths for every county in a specific state, so many causes of death for those [county-level searches] will be suppressed, that your totals from the counties would not match the actual totals [at the state level]. Because you may not be aware that the CDC is not showing you the values that were suppressed if you didn’t click a specific button—or if you’re quickly adding things.

    BL: Another thing that [our team ran into] is occurrence versus residence—that’s something people need to know about. “Residence” means sorting by where people lived, “occurrence” means sorting by where they died. Those don’t always match up.

    DB: Yes, I would say residence versus occurrence is very important to keep in mind, especially because, when you’re redoing a search and scrolling very fast, you can accidentally fill out a state for occurrence instead of residence. Which actually did happen to me, and then I was confused by my own numbers. Then I noticed that there were a bunch of states coming up that I hadn’t meant to search for, because I, like, filtered by residence and then searched by occurrence.

    So yeah, keeping in mind the difference between residence and occurrence is definitely important. Though if you go back in the historical data [before 2018], it’s just residence—just a single state for each death.

    Also, just clear some extra time to get used to working with the WONDER interface. Because, unlike the CDC data updates that are just on the data.cdc.gov website, that you can just quickly download and open up in your technical took of choice—for WONDER, you do have to use the WONDER query site, and it can be difficult to get used to searching and importing. 

    BL: I will say one more thing, while we’re on this topic, that I’ve been doing and that might be helpful for other people: make sure that, if you export data from WONDER, that you always save that notes section it gives you at the bottom [of the exported file]. Because that will tell you exactly what you searched for. So, if you want to replicate something later, you can just go back and look at the notes. I feel like my instinct, often, when I’m looking at a dataset, is to delete all the notes and anything I don’t need—so I have to remind myself, like, “No, you should keep this.”

    DB: That’s actually a really good tip, because I do that… I import the data [to my computer] and then I delete all the notes. That’s a great point.

    BL: Also, what recommendations do you have if people are looking for, like, experts to interview about these data? Say a local reporter wants to search for experts in their area, what should they do?

    DB: I can speak about that, because that’s been really useful for me in my reporting. Once you have this data, or once you’ve researched excess deaths in your area, you should talk with an epidemiologist or a social epidemiologist—someone who would know your state, or maybe even your more local area—about the broader mortality trends in your community. That will really give you a deep understanding of, what were the reasons that people were dying before the pandemic? And what has this expert thought about during the pandemic? And what have they heard, or read, or researched about why deaths are increasing? For example, I talked to two epidemiologists in Mississippi while working on our investigation, and they really helped me understand what I was looking at and looking for.

    BL: Awesome. And then, my last, kind-of big picture question is, why does this matter for people who aren’t epidemiologists or COVID reporters?

    DB: That is also a good question. I think the thing that I have been thinking about over and over again—and it’s something that an epidemiologist told me—which is that, if we understand how people die, then we might know what’s making them sick. And if we know what’s making them sick, then we have a shot at stopping that from happening.

    This data is a very important step in that process, which is learning, in real-time, why people are dying. If we know that, we know what’s making them sick, whether it’s unintentional drug overdoses, or an increase of deaths because of lung cancer or heart disease. Any of those things are important to know, especially in a public health crisis like the one we’re in right now.

    BL: I know we’ve talked before about this sort-of cycle of, what happens when COVID deaths are maybe undercounted in a certain community, and then that contributes to people maybe being less aware of COVID in their community. And then [that lack of awareness] contributes back to the same process.

    DB: Yeah, exactly. I think that’s an important thing as well. Throughout this process—reporting on this topic, and working with this data, and thinking more about death certificates and the information on them—I’ve been increasingly… Not evangelized, exactly, but I’ve seen the light on the importance of that final piece of information of people’s lives. And what it means not only to their families and to the local area and communities, but also what it means when we start pulling that data up to larger and larger groups, and trying to understand: what does this person’s death mean at the level of the county, or the state, or in their racial demographic, or in their age demographic, or by gender?

    All of this is critically important. And it sounds kind-of corny, but in a way, [the death certificate] is like, one really last piece of information that you leave behind for humans after you.


    More national data

  • Public health data in the US is “incredibly fragmented”: Zoe McLaren on booster shots and more

    Public health data in the US is “incredibly fragmented”: Zoe McLaren on booster shots and more

    This week, I had a new story published at the data journalism site FiveThirtyEight. The story explores the U.S.’s failure to comprehensively track breakthrough cases, and how that failure has led officials to look towards data from other countries with better tracking systems (eg. Israel and the U.K.) as they make decisions about booster shots.

    In the piece, I argue that a lack of data on which Americans are most at risk of breakthrough cases—and therefore most in need of booster shots—has contributed to the confusion surrounding these additional doses. Frequent COVID-19 Data Dispatch readers might recognize that argument from this CDD post, published at the end of September.

    Of course, an article for FiveThirtyEight is able to go further than a blog post. For this article, I expanded upon my own understanding of the U.S.’s public health data disadvantages by talking to experts from different parts of the COVID-19 data ecosystem.

    At the CDD today, I’d like to share one of those interviews. I spoke to Zoe McLaren, a health economist at the University of Maryland Baltimore County, about how the U.S. public health data system compares to other countries, as well as how data (or the lack of data) contribute to health policies. If you have been confused about your booster shot eligibility, I highly recommend giving the whole interview a read. The interview has been lightly edited and condensed for clarity.


    Betsy Ladyzhets: I’m writing about this question of vaccine effectiveness data and breakthrough case data in the U.S., and how our data systems and sort-of by extension public health systems compare to other countries. So, I wanted to start by asking you, what is your view of the state of this data topic in the U.S.? Do you think we can answer key questions? Or what information might we be missing?

    Zoe McLaren: It’s the age-old problem of data sources. A lot of cases are not going to be reported at all. And then even the ones that are reported may not be connected to demographic data, for example, or even whether the people are vaccinated or not. Whereas other countries like Israel, and the U.K., your positive COVID test goes into your electronic health record that also has all the other information. 

    And Medicare patients, they have that whole [records] system. There will be information [in the system] about whether they got vaccinated, as well as whether they have a positive test. So that data will be in there. But for other people, it may or may not be in an electronic health record. And then of course, there’s multiple different electronic health record systems that can’t be integrated easily. So you don’t get the full picture.

    But it’s all about sample selection. Not everyone [who actually has COVID] is ending up in the data, which messes up both your numerator and denominator when you’re looking at rates.

    BL: Could you say more about how our system in the U.S. is different from places like Israel and the U.K., where they have that kind of national health record system?

    ZM: When the government is providing health insurance, then all of your records and the [medical] payments that happen, there’s a record of them… And then, because it’s a national system, it’s already harmonized, and everyone’s in the same system. So it’s really easy to pull a dataset out of that and analyze it.

    Whereas in the US, everything is incredibly fragmented. The data, and the systems and everything is very fragmented. The electronic health systems don’t merge together easily at all. And so you get a very fragmented view of what’s going on in the country.

    BL: Right, that makes sense. Yesterday, I was talking to a researcher at the New York State Health Department who did a study where they matched up the New York State vaccination records with testing records and hospitalization records, and were able to do an analysis of vaccine effectiveness. And he said, basically, the more specific, you tried to go with an analysis, the harder it is to match up the records correctly, and that kind of thing.

    ZM: Exactly. It’s easy to match on things like age, sex, race, since everybody has them. But then, the different data fields are gonna have different formats and be much harder to merge together.

    BL: So what can we do to improve this? I know Medicare for All is one option— 

    ZM: Medicare for All, end of story, end of article. It would solve so many problems.

    It’s tricky, though, because there isn’t a simple fix. All of these health systems have their own electronic health records, and integrating them is really costly and hard to do, and who is going to pay for that? There’s also additional privacy concerns about integrating things, in terms of protecting privacy and confidentiality. So, that’s really tricky.

    The way that we get around that, in general, is to have reporting requirements. Like with COVID tests, [providers are] required to report to the CDC or the HHS… Still, that’s also costly and time consuming. But that is kind-of the best thing that we can do right now, is have the different [public health] entities produce reports on a regular basis and send that to a centralized location. And the reports are supposed to be produced in a way that they are harmonized, they’re easy to put together from all the different systems.

    The problem with the different systems not integrating is, it requires everyone to basically fill out the equivalent of a form and send it in—listing individual patient information, or at the state level, individual county information. An example of that is the COVID data. All of the COVID data gets reported up to the national level [by state and county health departments]… 

    But the reporting often gives you the numerators, when you need to figure out the denominators. Because you would want to know, for example, we want to know what proportion of breakthrough cases end up hospitalized. But if only the hospitalized people end up in the data, and a lot of breakthrough cases go either undetected or never tested, or they do an at-home test and there’s no record of that positive case in the system, then your denominator is—there’s a problem with your denominator. That’s a problem with sample selection, you get people that are self-selecting into the numerator [by testing positive], but also self-selecting into the denominator [by getting a test to begin with].

    BL: Yeah, that makes sense. I know you said it would be pretty complicated to basically force different public health departments—to standardize them so that they’re all reporting in the same way. Is there more that researchers in the US could be doing in the short-term to either improve data collection or use what we have to answer questions like, what occupations might confer higher risk of a breakthrough case? 

    ZM: This is a coordination problem. Because in general, we all have an incentive to contribute to having a better understanding of breakthrough cases. But the trick is that, unless the national government or the CDC takes the role of saying what the [data] format’s gonna look like…

    Part of the problem is that there’s an effort involved [in collecting these data] and people don’t want to put in the effort. But if they do want to put in the effort, then you still have a coordination problem, because who gonna to be deciding what format we’re using?

    BL: Or like, what the data definitions are.

    ZM: Exactly. Like, do you report the month and the day of the vaccination dose, or just the month of the dose? Things like that where it doesn’t seem like a big deal, but it does matter for research purposes. If you look, for example, at the Census, or any of the national surveys, like the Current Population Survey or the National Labor Force Survey where we get unemployment numbers, there are big committees that figure out which questions we’re asking and how we ask them. So, if the CDC just says, like, “This is the dataset we’re building,” then everyone [local agencies] will be like, “Okay, we’re gonna send our reports in that way.” 

    Part of [the challenge] is that it takes effort to produce the data, and part of it is somebody needs to coordinate. And usually that would be something the CDC would do, saying, “This is the data that needs to be reported to us,” and everybody reports to them. But they could be doing more, they could be asking for more detailed information—for example, data based on vaccination status, because that information will be important for understanding the progression of the pandemic.

    BL: Yeah. I volunteered for the COVID Tracking Project for a while, and one of the most tedious things that we had to do there was figuring out different definitions for like, what states were considering a case or a test, or whatever else. So that definitely makes sense to me.

    ZM: Exactly. And the COVID Tracking Project filled a gap. Nobody was doing that [collecting data from the states], so the COVID Tracking Project did that… But it’s tricky, because a lot of the stuff that seems like splitting hairs [on definitions] really does make a difference when you’re doing your analysis.

    BL: I also wanted to ask you about what the implications are of this lack of standardized data in the U.S., and the lack of information that we have—largely around vaccinations, but I think there are other areas as well where we’re missing information. So I’m trying to figure out, for this story, how data gaps might contribute to the confusion that people feel when they watch health agencies make decisions. Like watching all the back and forth on booster shots, or thinking about Long COVID, other things like that.

    ZM: Well, we talk about evidence-based medicine, and we also care about evidence-based policy. And so it means that when the quality of data is poor, the quality of our policy is going to be worse. So it really is in everybody’s best interest to have high-quality data, because that is the bedrock of producing high quality policy.

    BL: Right. So if we don’t know, for example, if people who live and work in certain situations are more likely to have a breakthrough case, then we can’t necessarily tell them—we can’t necessarily say, “These specific occupations should go get booster shots.” And then we just say, “Everyone can go get a booster shot.”

    ZM: It means that we’re flying blind. And the problem of flying blind is twofold. One is that you can end up making poor decisions, the wrong decisions, because you don’t have the data. And then the other problem is that you end up making decisions that, in economics, we call it “inefficient.” I think about [these decisions] as, you end up with “one size fits all.” 

    If we have really high quality data, then we’re able to create different policies for different types of people, and that helps minimize any of the downsides. But the less data we have, the more we have to rely on “one size fits all.” And of course, if “one size fits all,” it’s going to be too much for some people and too little for others. Data would help improve that.

    BL: How do you think that this kind of “one size fits all” contributes to how individual people might be confused or might not be sure how to kind of interpret the policies for their own situations?

    ZM: I think in a “one size fits all,” people get very frustrated because they see in their own lives, both the uncertainty and how that can be stressful—and also the waste. The situations where they fall under one policy, but they have enough information to know that that policy doesn’t necessarily apply to them. It does undermine confidence in policymaking. People get frustrated with “one size fits all,” because it seems wasteful.

    Though sometimes the “one size fits all” is still optimal, it’s better than the alternative. For example, the recommendation of “one size fits all” wearing masks tends to trump the “one size fits all” of not wearing masks. But there’s waste. There are situations where we end up wearing masks where they wouldn’t necessarily be needed. And vice versa.

    BL: Yeah. That makes me think of friends I have who are eligible to get booster shots because of medical conditions, but they’re sort-of thinking, “I wish the shots could go to another country where they need vaccinations more.” And that’s not something individuals have any control over, but it’s frustrating.

    ZM: Part of it is, with the booster shots, is the guidelines that say people who have higher occupational exposure to risk [are eligible] without specifying exactly who that is. That is one way that we allow some leeway. So it’s not a “one size fits all” where nobody gets it, because there’s actually people who qualify under higher occupational exposure. But we also don’t want to have a “one size fits all” where we tell everyone they need it, because we do want to be sending doses abroad as well.

    So that’s a situation where we know that a “one size fits all” is not perfect. And so we create a, like, “use your judgement, talk to your doctor” kind-of thing that tries to help people self-select into the right groups… There are likely a lot of people who do have higher exposure and should be getting it, but don’t think the benefit applies to them.

    Editor’s note: According to one analysis, about 89% of U.S. adults will qualify for a booster shot after enough time has passed from their primary vaccine series. And, according to the October COVID-19 Vaccine Monitor report, four in ten vaccinated adults were unsure whether they qualified.

    BL: I also wanted to ask, you mentioned rapid tests—those don’t necessarily get reported. Are there other other things that you think pose data gaps in the U.S. public health system?

    ZM: With rapid tests, the actual tests are not getting reported. But the important thing is, people are getting tested. I mean, the reason we want good data quality is to reduce cases, and we wouldn’t want to limit access to rapid tests in order to collect data, because it’s much easier to prevent the cases by allowing people to get tested in their homes.

    But yeah, just the fact that there’s no centralized database for analysis [is a gap]. I mean, if you look at the U.K., and Israel, they have these great studies, because they’re able to just download, like, the entire population into a dataset. And it has all the information they need, like demographic factors. The fact that the U.S. has made so much of its national policy based on Israeli data, this shows how far behind we are with having our own data to answer these questions.

    BL: Yeah. I know, it’s something like half or a third of cases in the U.S., the CDC doesn’t have race and ethnicity information for [editor’s note: it’s 35%], and other stuff like that. It’s wild.

    ZM: Yeah… And one of the things about reporting is that every additional piece of data you want is very costly. And so you have to be very judicious about [collecting new values].

    BL: Well, those were all my questions. Is there anything I didn’t ask you that you think would be important for me to know for this story?

    ZM: Just that data is helpful for planning now, and helpful for the future. If we can improve our data systems now—it’s part of being prepared for the next pandemic.

    More vaccine reporting

  • Booster shot data slowly makes it onto state dashboards, but demographic information is lacking

    Booster shot data slowly makes it onto state dashboards, but demographic information is lacking

    Ohio is one of just eight states reporting demographic data for booster shots administered in the state. Screenshot taken on November 7.

    It’s now been over a month since the FDA and the CDC authorized third doses of Pfizer’s COVID-19 vaccine for a large swath of the U.S. population, and a couple of weeks since the agencies did the same thing for additional doses of Moderna and Johnson & Johnson’s vaccines. In that time, over 20 million Americans have received their boosters.

    This weekend, I set out to see what data are now available on these booster shots. I updated my vaccination data in the U.S. resource page, which includes detailed annotations on every state’s vaccine reporting along with several national and international sources.

    The majority of states (and national dashboards) are now including booster shots in their vaccine reporting, I found. But in most cases, the reporting stops at just one statistic: the total number of residents who have received an additional dose. A few states are reporting time series information—i.e. booster shots administered by day—and a few are reporting demographics—i.e. booster shot recipients by age, gender, race, and ethnicity—but these metrics are lacking across most dashboards.

    Demographic information, particularly race and ethnicity, should be a priority for booster shot data, as it should be for numerous other COVID-19 metrics. At the beginning of the U.S.’s vaccine rollout, Black and Hispanic/Latino Americans lagged behind white Americans in getting their shots, but limited data hindered the public health system’s ability to respond to this trend. (Now, the trends have evened out somewhat, though Black vaccination rates still lag white rates in some states.)

    Will we see the same pattern with booster shots? Considering the immense confusion that has surrounded America’s booster shot rollout in the last couple of months, it would not be surprising if disadvantaged communities are less likely to know about their potential need for a booster, or where and how to get those shots.

    But so far, we don’t have enough data to tell us whether this pattern is playing out. The CDC has yet to report booster shot data by race or ethnicity, though the agency is now reporting some figures by age and by state. Note: the CDC still has yet to report detailed vaccination data by race and ethnicity, period; the agency just reports national figures, nothing by state or other smaller geographies.

    At the state level, just eight states are reporting booster shots by race and ethnicity. 13 states are reporting some kind of time series (boosters administered by day or week), and three are reporting doses administered by vaccine manufacturer.

    Here are all the states that I found reporting booster shot data, with links to their dashboards:

    • Arkansas: Reporting total boosters only.
    • California: Total boosters only.
    • Colorado: Reporting demographics; age, race/ethnicity, and sex.
    • DC: Total boosters for DC and non-DC residents.
    • Delaware: Reporting demographics; age, race/ethnicity, and sex.
    • Florida: Total boosters only.
    • Indiana: Total boosters and doses administered by day.
    • Kansas: Total boosters and doses administered by day.
    • Louisiana: Total boosters only.
    • Massachusetts: Total boosters and doses administered by day.
    • Maryland: Reporting demographics; age, race/ethnicity, and sex.
    • Michigan: Reporting demographics (age, race/ethnicity, and sex) as well as doses administered by week and by manufacturer.
    • Minnesota: Total boosters only.
    • Missouri: Total boosters and doses administered by day.
    • Mississippi: Reporting demographics (age and race/ethnicity) as well as doses administered by facility type (total and for the prior week).
    • North Dakota: Total boosters and doses administered by day.
    • New Jersey: Reporting demographics (age, race/ethnicity, and sex) as well as doses administered by day and by manufacturer.
    • New Mexico: Total boosters only.
    • Ohio: Reporting demographics (age, race/ethnicity, and sex) as well as doses administered by day and by county.
    • Oklahoma: Total boosters only.
    • Oregon: Total boosters, doses administered by day and by county.
    • Pennsylvania: Total boosters and doses administered by day.
    • Rhode Island: Boosters administered by day only.
    • South Carolina: Boosters administered by day only.
    • South Dakota: Total boosters, doses administered by week and by county.
    • Texas: Total boosters only.
    • Virginia: Reporting demographics; age, race/ethnicity, and sex.
    • Vermont: Total boosters only.
    • Wyoming: Total boosters and doses administered by manufacturer.

    Local reporters: If your state is reporting demographic data, I recommend taking a look at those numbers. How does the population receiving booster shots compare to the overall population of your state, or to the population that’s received one or two doses? And if your state is not reporting demographic data (or any booster data at all), ask your public health department for these numbers!

    You can see my vaccine annotations page for more information on all of these state dashboards. And if there are any states or metrics I missed, please let me know! Comment here or email me at betsy@coviddatadispatch.com.

    More vaccine reporting

  • Featured sources, October 24

    • More booster shot data from the CDC: The CDC has added more data on additional vaccine doses to its COVID-19 dashboard. Specifically, we can now analyze booster shots by state: raw numbers, share of the fully vaccinated population with a booster, and limited age data (18+, 50+, 65+). If anyone from the CDC is reading this: I would love to see some race/ethnicity data next!
    • Racial and ethnic disparities in COVID-19 hospitalization: A new CDC study published this week in JAMA Open Network presents analysis of data from COVID-NET, the national agency’s surveillance system for COVID-19 hospitalizations. The study, like other research on this topic, found that non-white Americans were far more likely to be hospitalized with COVID-19 or die from the disease in the first year of the pandemic than their white neighbors. Supplemental tables for the study include breakdowns of COVID-19 hospitalizations by different demographic groups, by underlying medical conditions, and over time.
    • The COVID States Project: In this polling project, researchers surveyed people in all 50 U.S. states to ask whether they approve of the president and of their governors. The survey is jointly run by researchers at Harvard, Northeastern, Northwestern, and Rutgers Universities. This latest report, released in October, includes executive approval data stratified by political party and vaccination status.
    • COVID-19, compared to other leading causes of death: COVID-19 was the number two cause of death in the U.S. in September 2021—after heart disease—according to this report from the Peterson Center on Healthcare and the Kaiser Family Foundation. The report compares COVID-19 to other top causes of death in the country, including data over time and by age group.