Author: Betsy Ladyzhets

  • National numbers, April 10

    National numbers, April 10

    Coronavirus levels in wastewater are now rising in all regions of the country, according to Biobot. Screenshot taken on April 9.

    In the past week (April 2 through 8), the U.S. reported about 190,000 new COVID-19 cases, according to the CDC. This amounts to:

    • An average of 27,000 new cases each day
    • 57 total new cases for every 100,000 Americans
    • 5% more new cases than last week (March 26-April 1)

    In the past week, the U.S. also reported about 10,000 new COVID-19 patients admitted to hospitals. This amounts to:

    • An average of 1,400 new admissions each day
    • 3.0 total admissions for every 100,000 Americans
    • 10% fewer new admissions than last week

    Additionally, the U.S. reported:

    • 3,500 new COVID-19 deaths (1.1 for every 100,000 people)
    • 100% of new cases are Omicron-caused; 72% BA.2-caused (as of April 2)
    • An average of 100,000 vaccinations per day (per Bloomberg)

    After several weeks in a plateau, new COVID-19 cases in the U.S. are once again going up at the national level. The CDC reported an average of 27,000 new cases a day last week—less than one-tenth of what we saw during the Omicron surge, but still a notable uptick from the week prior.

    National numbers of newly hospitalized patients and COVID-19 deaths are both still trending down; this is unsurprising, as trends in hospitalizations and deaths typically follow cases by several weeks.

    Wastewater, a leading indicator, is showing pronounced increases both nationally and in all four major regions of the country, according to Biobot’s tracker. Similarly, more than half of the wastewater monitoring sites in the CDC’s network have shown increases in coronavirus levels over the last two weeks.

    That wastewater signal likely means that cases will keep going up in the next couple of weeks. BA.2 is a clear culprit for this: the more-contagious Omicron sublineage is now causing about three in four new COVID-19 cases in the U.S., according to the CDC’s latest estimates. BA.2’s dominance led the FDA to pull its emergency use authorization for Sotrovimab, a monoclonal antibody drug that works against Omicron BA.1—but not against BA.2,

    As we’ve seen for the last couple of weeks, the Northeast continues to be a leader in case increases. Jurisdictions with the highest cases per capita in the week ending April 6 are Alaska, Vermont, Rhode Island, Washington, D.C., New York, Massachusetts, New Jersey, and Maine. All reported more than 100 new cases for every 100,000 residents, per the latest Community Profile report.

    Under the CDC’s old community level guidance, all of these Northeast states (and Alaska) would be classified as seeing high transmission. But under the new, more lenient guidance, 99% of the country—including most counties in these states—are classified as “low” or “medium” community levels.

    These lenient levels don’t account for warnings in our wastewater, not to mention under-testing as PCR sites close and at-home tests go unreported. As Katherine Wu wrote in The Atlantic this week, the U.S. may be facing a new surge, but it’s harder to accurately track COVID-19 now than it has been since spring 2020. Don’t let the low numbers fool you into thinking all is well.

  • COVID source callout: Florida, again

    Last summer, Florida was one of the first states to decommission its daily COVID-19 dashboard and replace it with far-less-detailed weekly reports. Many other states have followed Florida’s lead in the last few months, making their reporting less frequent and cutting down on some metrics like cases and testing.

    But that’s not enough for Florida! The state recently switched from weekly COVID-19 reports to reports every other week—making it even more difficult for reporters, researchers, and others in the state to follow their local COVID-19 trends. Florida additionally stopped reporting cases in non-state residents, which is pretty notable for one of the country’s biggest tourism hotspots.

    Of course, Florida is still reporting some COVID-19 data daily to the federal government, as all states are required to do. But this doesn’t bode well for the future of state data reporting.

  • Sources and updates, April 3

    • Feds unveil new COVID.gov website: This week, the federal government launched a new website, COVID.gov, intended to be a one-stop-shop for Americans to find COVID-19 guidance and connect to resources in their communities. It’s a fun kind of irony that this is launching over two years into the pandemic, at a time when the U.S. is about to lose funding for free vaccines, tests, and other health measures. One wonders how many people will actually use this website!
    • FDA and CDC authorize additional booster shots for seniors: This past Tuesday, the FDA authorized a fourth dose for Americans over age 50 who received their booster of Pfizer or Moderna’s vaccine at least four months ago. The CDC incorporated this additional dose into their recommendations later that day; fourth doses are also recommended for immunocompromised people, and additional mRNA vaccine people who originally received two doses of the Johnson & Johnson vaccine. Notably, the FDA and CDC decisions come before an FDA advisory committee meeting, scheduled for this coming Wednesday, about booster shots. Not a great look for either agency’s transparency.
    • New data on Johnson & Johnson vaccine effectiveness: When the CDC recommended that anyone who received two J&J doses should get a third dose of Pfizer or Moderna’s vaccine, the agency cited this study published last week in MMWR. CDC researchers and their collaborators found that, during the Omicron surge, vaccine effectiveness against a COVID-related hospitalization or emergency department visit was much higher for J&J recipients who got a booster dose of an mRNA vaccine (90% for hospitalization, 79% for ED visit) compared to those who received two J&J doses (67% and 54%).
    • Racial disparities in COVID-19 patients with cancer: Another new study, published this week in JAMA Network Open, found that Black COVID-19 patients with cancer are more likely to experience severe outcomes than white patients—even after the scientists adjusted for other demographic and clinical factors. Black cancer patients already have higher mortality rates than white patients, the scientists explain in their paper; COVID-19 worsened this existing inequality.
    • NYC mask compliance: I recently learned that the New York City Metropolitan Transportation Authority (MTA) regularly publishes data demonstrating how well passengers on MTA subways and buses are complying with the city’s mask requirement for public transportation. The data are compiled from surveys; MTA workers observe passengers at a selection of subway and bus stops, and count how many people are wearing masks (categorized by whether the masks are worn correctly or not). Compliance recently slipped to a new low, AMNY reports.
    • Database of WHO disease outbreak reports: A group of researchers led by Colin J. Carlson has compiled a database of over 2,700 outbreak reports from the World Health Organization, which include information on significant public health events (or “potential events of concern”) going back to December 1996. You can read a preprint with analysis of the database here. (H/t Data Is Plural.)

  • Send me your COVID-19 questions!

    It’s been a while since I did a formal request for reader questions. (And, gotta be honest, I am a little low on content for this week after spending the past few days at SEJ.)

    So, here is a formal request: let me know what you’re wondering around COVID-19 in the U.S. We’re in a confusing period right now, as BA.2 prevalence increases and safety measures are dropped across the country. What do you want to know? I’m most qualified to answer data-specific questions, but I can do my best with other questions as well.

    To send in a question, simply email me at betsy@coviddatadispatch.com or comment on the post below. You can also fill out this Typeform survey that I originally sent out in January, if you missed it at that time or if your perspectives have changed.

  • Fenceline communities left behind by data gaps: A dispatch from SEJ in Houston

    Fenceline communities left behind by data gaps: A dispatch from SEJ in Houston

    This week, I’m sharing a short dispatch from the Society of Environmental Journalists (SEJ) conference in Houston, Texas. Unlike other journalism conferences I’ve attended, SEJ meetings don’t just sequester you in your hotel all day: the organizers plan field trips that are designed to give reporters on-the-ground information about environmental issues at the place they’re visiting.

    I went on one of these trips, to the Houston Ship Channel and surrounding communities impacted by industrial pollution. For me, this experience was a lesson in the cascading health issues caused by environmental racism—including, of course, COVID-19—as well as the ways that data gaps can make it harder for hard-hit communities to get needed public health assistance.

    The Houston Ship Channel, I learned this week, is a passage for ships going between Houston’s port and the Gulf of Mexico. According to the Port Houston website, it’s the largest container port in the Gulf Coast, handling about two-thirds of all shipping containers that travel through the region. (Shipping containers include all the consumer products that we order online.)

    It is also the single largest U.S. port for petroleum exports. Every month, thousands of tons of oil and plastics (which are made from oil) pass through the Houston Ship Channel; much of this cargo is processed right on the banks of the channel, in massive refineries that define the landscape around Houston.

    With SEJ, I went on a boat tour through the Houston Ship Channel. We passed refineries and industrial plants from Valero, Chevron, Exxon-Mobil, and other major companies, getting a close look at just how much space these facilities take up and how they decimate the surrounding land.

    After the boat, my group went to Manchester, a neighborhood close to the channel in southeast Houston. Community activists from the local environmental advocacy group TEJAS explained that this neighborhood’s population is overwhelmingly Latino; many residents are low-income workers with no college degrees who speak Spanish as their first language.

    Manchester residents have faced intense pollution from industrial plants that border their homes, schools, and community spaces. We walked through a park that is surrounded on multiple sides by these plants; we could see smoke from chemicals burning, and smell the results of that burning in the air. Valero, which owns one of the nearby plants, had recently sponsored a playground in this park as a small gesture, barely acknowledging the harm it’s caused to this neighborhood.

    Of course, my immediate question was: what are the COVID-19 statistics for this neighborhood? To me, it seemed obvious that Manchester residents living with this intense pollution would face higher rates of respiratory conditions, cancers, and other diseases that would make them more vulnerable to severe COVID-19 symptoms. (Poor quality air has been linked with more severe COVID-19 outcomes since the early days of the pandemic.)

    Here’s the problem: nobody could actually answer my question. I spoke to Leticia Ablaza, government relations director at Air Alliance Houston and another speaker on the tour, who explained that the link between pollution and COVID-19 in Manchester and other similar Houston neighborhoods has yet to be studied. Anecdotally, she said, she knows community members with respiratory conditions who have faced heightened vulnerability to COVID-19. But there’s no formal data.

    The reason for this lack of formal studies became clear to me later, when I attended a conference session on the links between COVID-19 and environmental health. Annie Xu, a Rice University student who has studied health disparities in Texas, said at this session that the state of Texas does not publish any COVID-19 data below the county level.

    Xu’s research group did identify links between Texas counties’ racial demographics and their COVID-19 burden, published in Nature Scientific Reports in January. But when the group looked for links between air pollution and COVID-19, the analysis didn’t lead to significant results.

    This finding is likely because pollution can vary widely within Texas counties, Xu said. For example, there’s a huge gap between air quality in Manchester and on Rice’s campus, both of which are included in Harris County. To truly find a connection between pollution and COVID-19, a research group like hers would require more granular data, such as at the ZIP code or census tract level.

    But the Texas public health department only publishes COVID-19 data at the county level—with the exception of vaccinations, one metric that is available by ZIP code. The federal government doesn’t report COVID-19 data below the county level either.

    Without this granular information, it’s difficult to demonstrate the impacts of petrochemical pollution on COVID-19 in neighborhoods like Manchester. The community isn’t able to get priority status for public health interventions like vaccines or testing—meaning that its vulnerabilities are unlikely to change.

    As longtime readers know, I have spent a lot of time grappling with COVID-19’s demographic disparities. I was a leading volunteer for the COVID Tracking Project’s COVID Racial Data Tracker, and have sought to call attention to the terrible state of this type of COVID-19 data in the U.S. whenever I can. Still, it was a new experience to actually see a community left behind by the data gaps that I cover.

    What kind of investment would be required to truly study how COVID-19 has impacted a place like Manchester, in Houston? And what other environment-related health conditions do we need to be investigating in these areas? I hope that future stories will enable me to answer these questions.

    For now, if you have any questions, comments, or data source recommendations in this area, please reach out!

  • National numbers, April 3

    National numbers, April 3

    BA.2 caused more than two-thirds of new COVID-19 cases in the Northeast in the week ending March 26, according to CDC estimates. It’s no coincidence that this region is also seeing cases start to tick up.

    In the past week (March 26 through April 1), the U.S. reported about 180,000 new COVID-19 cases, according to the CDC. This amounts to:

    • An average of 26,000 new cases each day
    • 55 total new cases for every 100,000 Americans
    • 3% fewer new cases than last week (March 19-25)

    In the past week, the U.S. also reported about 11,000 new COVID-19 patients admitted to hospitals. This amounts to:

    • An average of 1,600 new admissions each day
    • 3.3 total admissions for every 100,000 Americans
    • 16% fewer new admissions than last week

    Additionally, the U.S. reported:

    • 4,400 new COVID-19 deaths (1.3 for every 100,000 people)
    • 100% of new cases are Omicron-caused; 55% BA.2-caused (as of March 26)
    • An average of 90,000 vaccinations per day (per Bloomberg)

    Nationwide, COVID-19 cases in the U.S. have reached a plateau. New cases decreased only 3% from the previous week to this week, following an 8% decrease the week before that. New hospitalizations and deaths are also declining slightly, approaching the same plateau pattern.

    Wastewater is showing a similar pattern, too. The overall, national trend of coronavirus levels in wastewater has been in a plateau for a couple of weeks now, according to the Biobot dashboard. Regionally, the Northeast saw a slight uptick followed by an even slighter downturn, and the South may be seeing a slight uptick now.

    BA.2, the Omicron sublineage that is more transmissible than the version of this variant that first reached us in the U.S., is now causing over half of new COVID-19 cases nationwide, according to CDC estimates. Two weeks ago, I wrote that 50% prevalence was a threshold for cases starting to increase in Europe; if the U.S. follows Europe (as we usually do), that means we’ll start seeing case increases here in the next week.

    According to the CDC’s estimates, BA.2 is already causing almost 75% of new cases in the New England and New York/New Jersey regions. It’s unsurprising, then, that several Northeast states have reported case increases in the last week. According to the latest Community Profile Report, states that reported increases above 25% week-over-week include: Arizona, Alabama, Ohio, Delaware, North Carolina, Hawaii, Massachusetts, and New York.

    New York City—an early hotspot for BA.2, as it was for the original Omicron strain in December—reported more than 100 cases for every 100,000 residents last week, according to both city data and the CDC’s figures.

    Under the old CDC thresholds, this would have put the city in a “high transmission” zone, indicating that all residents should mask up in public, indoor spaces. However, the new CDC guidance places New York City in a “low” level, meaning masks are not recommended—a clear example of the lenience in this new guidance.

    It’s good news that we’re not seeing a sharp BA.2-driven increase here in the U.S. yet, either within coronavirus levels in wastewater or within the case data. A BA.2 surge here may likely be a small bump rather than a huge wave. Still, the new lenience in safety measures—combined with federal funding running out for free testing, vaccinations, and other COVID-related coverage—is making me pretty nervous.

  • COVID source shout-out: Cyrus Shahpar

    The Twitter account of White House COVID-19 Data Director Dr. Cyrus Shahpar is, as I’ve said in the CDD before, an excellent source of updates on all things federal pandemic data. Shahpar shares daily updates of new vaccinations in the U.S., usually shortly before the CDC’s tracker updates. He also shares updated variant prevalence estimates, changes and additions to the CDC COVID-19 dashboard, and other data news.

    But this past Wednesday, Shahpar’s account took on a new purpose: tech support for the CDC’s dashboard. 

    Shahpar said he would “look into” an error with the dashboard’s formatting, after journalist Alexander Tin flagged the issue to him. It’s unclear whether Shahpar’s efforts directly led to the dashboard getting fixed, but it was indeed back to its normal appearance by the next morning.

  • Sources and updates, March 27

    • New report on pandemic-related workplace violence for public health officials: A new study, published last week in the American Journal of Public Health, shares the results of a survey that included hundreds of public health officials across the U.S. During the study’s time frame (March 2020 to January 2021), the researchers identified about 1,500 instances of harassment against public health officials, and found that over 200 officials left their jobs. And public health has only become more polarized in the year since this survey period ended. See this article in STAT News for more context on the study.
    • Health insurance plans available through the federal insurance marketplace: This one isn’t directly COVID-related, but it seemed like an interesting data source to share: the Centers for Medicare & Medicaid Services (CMS) publishes a series of data files on health insurance plans available through the federal Health Insurance Exchange. The files include health benefits, coverage limits, cost-sharing potential, provider networks, anonymized insurance claims, and much more. (H/t Data Is Plural.)
    • At-home COVID-19 test use exacerbates inequities: This week, the CDC published a new MMWR study discussing rapid at-home test use. The authors used an online survey to estimate at-home test use among about 400,000 U.S. adults between August 2021 and early March 2022. Its findings provide additional evidence for the popularity of these tests during the Omicron surge, as well as for the way that these tests exacerbate health inequities in the U.S.: “at-home test use was lower among persons who self-identified as Black, were aged ≥75 years, had lower incomes, and had a high school level education or less,” the authors reported.
    • Considering another round of mRNA booster shots: Will the U.S. authorize a fourth round of shots for Americans who received the Pfizer and Moderna vaccines? At the moment, signs point to yes: countries like Israel and the U.K., which U.S. regulators watch for their vaccine efficacy data, are providing fourth doses to seniors. And the Biden administration is planning fourth doses for U.S. adults over age 50, the New York Times reported on Friday. Data so far suggest that these additional doses may be useful for older adults, but provide less of an immunity boost in younger age groups; Dr. Katelyn Jetelina’s Your Local Epidemiologist post on the subject provides a helpful overview of the evidence.
    • New data on Moderna vaccine for young children: As we consider additional boosters for seniors, the youngest Americans may soon be eligible for vaccination! Finally! After a lot of back-and-forth on the potential of Pfizer’s vaccine for kids under age five, Moderna released data this week suggesting that the company has found a dosage of its vaccine that significantly reduces the risk of severe COVID-19 symptoms for children between six months and six years old. Effectiveness against any symptomatic coronavirus infection was only about 40% in this trial—but that result is in line with vaccine efficacy for adults during the Omicron wave, when Moderna’s trial was conducted.

  • COVID-19 in schools data: still bad!

    COVID-19 in schools data: still bad!

    Screenshot of Burbio’s K-12 School Opening Tracker, taken on March 27.

    In addition to the FiveThirtyEight story, I also had an article come out this week in The Grade, Alexander Russo’s column at KappanOnline. This piece takes a deep dive into Burbio, the company that has become a leading source for data on how COVID-19 impacted K-12 schools across the U.S—in the absence of comprehensive data on this topic from the federal government.

    Burbio is pretty popular among education journalists, I learned in writing this story. Dennis Roche, one of the company’s founders, writes a weekly newsletter providing updates on COVID-19 in schools, and often makes himself available to answer reporters’ questions. Burbio has also become a major data source for the CDC, to the point that the agency provided Burbio with a $600,000 grant for its tracking efforts in the 2021-22 school year.

    However, in the story, I discuss several red flags that stood out to me as a science, health, and data journalist. These include:

    The company does not clearly disclose its dataset’s limitations, nor does it disclose its funding sources. Its data are not publicly available for researchers to vet. The popular data on school “disruptions” are easy to misinterpret when cited without context.

    Journalists citing Burbio should be clear about the data source’s limitations, I wrote. And they should also consider alternative sources; while Burbio filled a void by the federal government, it’s not the only source doing this work. The story highlights several potential options: MCH Strategic Data, the American Enterprise Institute’s Return to Learn tracker, a scientific researcher’s dataset, and an HHS dashboard that compiles data from multiple sources (including Burbio).

    Notably, Burbio did not even attempt to track COVID-19 cases in schools, opting instead to focus on learning modes and safety policies. A couple of research projects did track school cases in the 2020-21 school year, but this specific metric is now primarily tracked by state health departments with no comprehensive federal source. (The COVID School Tracker, one volunteer-run site that is still actively updating, compiles data from states.)

    To see what school COVID-19 case data each state is reporting, you can check out my annotations page here; I updated the annotations of both state and national sources yesterday.

    Some states are now reducing their reporting in this area, aligning with the overall recent trend of cutting back on COVID-19 data at the state level.  A couple of notable examples:

    • Indiana switched from reporting school-specific cases to reporting school-aged cases (i.e. all cases in children ages 5 to 18 or so). Reporting school-aged cases is often easier for a health department, since it doesn’t require contact tracing cases to classrooms.
    • Ohio stopped its reporting of COVID-19 cases in schools entirely. As of mid-March, schools in Ohio are no longer required to report most COVID-19 cases among students and staff to their local health departments, according to local news site Spectrum News 1 in Columbus. (The exception is cases identified by COVID-19 testing within schools.)
    • Vermont also stopped its reporting of COVID-19 cases in schools. A note on the state’s “PreK-12 Schools” page reads: “Due to changes in testing and contact tracing in schools, the COVID-19 Cases in Schools While Infectious report will no longer be updated after Jan. 10, 2022.

    More K-12 schools data

  • All the U.S.’s COVID-19 metrics are flawed

    All the U.S.’s COVID-19 metrics are flawed

    This week, I had a big retrospective story published at FiveThirtyEight: I looked back at the major metrics that the U.S. has used to track COVID-19 over the past two years—and how our country’s fractured public health system hindered our use of each one.

    The story is split into seven sections, which I will briefly summarize here:

    • Case counts, January to March 2020: Early on in the pandemic, the U.S. had a very limited picture of COVID-19 cases due to our very limited testing: after rejecting a test made by the WHO, the CDC made its own test—which turned out to have contamination issues, further slowing down U.S. testing. In early March 2020, for example, the majority of cases in NYC were identified in hospitals, suggesting that official counts greatly underestimated the actual numbers of people infected.
    • Tests administered, March to September 2020: Test availability improved after the first wave of cases, with organizations like the COVID Tracking Project keeping a close eye on the numbers. But there were a lot of challenges with the testing data (like different units across different states) and access issues for Americans with lower socioeconomic status.
    • Hospitalizations, October to December 2020: By late 2020, many researchers and journalists were considering hospitalizations to be a more reliable COVID-19 metric than cases. But it took a long time for hospitalization data to become reliable on a national scale, as the HHS launched a new tracking system in the summer and then took months to work out kinks in this system.
    • Vaccinations, January to June 2021: When the vaccination campaign started in late 2020, it was “tempting to forget about all other COVID-19 metrics,” I wrote in the story. But the U.S.’s fractured system for tracking vaccinations made it difficult to analyze how close different parts of the country were to prospective “herd immunity,” and distracted from other public health interventions that we still needed even as people got vaccinated.
    • Breakthrough cases, July to November 2021: The Delta surge caused widespread infections in people who had been vaccinated, but the CDC—along with many state public health agencies—was not properly equipped to track these breakthrough cases. This challenge contributed to a lack of good U.S. data on vaccine effectiveness, which in turn contributed to confusion around the need for booster shots.
    • Hospitalizations (again), December to January 2022: The Omicron surge introduced a need for more nuance in hospitalization data, as many experts asked whether COVID-19 patients admitted with Omicron were actually hospitalized for their COVID-19 symptoms or for other reasons. Nuanced data can be useful in analyzing a variant’s severity; but all COVID-related hospitalizations cause strain on the healthcare system regardless of their cause.
    • New kinds of data going forward: In our post-Omicron world, a lot of public health agencies are shifting their data strategies to treat COVID-19 more like the flu: less tracking of individual cases, and more reliance on hospitalization data, along with newer sources like wastewater. At this point in the pandemic, we should be fortifying data systems “for future preparedness,” I wrote, rather than letting the systems we built up during the pandemic fall to the wayside.

    I did a lot of reporting for this piece, including interviews with some of the U.S.’s foremost COVID-19 data experts and communicators. As long as the piece is, there were a lot of metrics (and issues with these metrics) that came up in these interviews that I wasn’t able to include in the final story—so I wanted to share some bonus material from my reporting here.

    Long COVID:

    As I’ve discussed in previous issues, the U.S. has done a terrible job of collecting data on Long COVID. The NIH estimates that this condition follows a significant share of coronavirus infections (between 10% and 30%), but we have limited information on its true prevalence, risk factors, and strategies for recovery.

    Here’s Dr. Eric Topol, the prolific COVID-19 commentator and director of the Scripps Research Translational Institute, discussing this data problem:

    [Long COVID has] been given very low priority, very little awareness and recognition. And we have very little data to show for it, because it hasn’t been taken seriously. But it’s a very serious matter.

    We should have, early on, gotten at least a registry of people —a large sample, hundreds of thousands of people prospectively assessed, like is being done elsewhere [in the U.K. and other countries]. So that we could learn from them: how long the symptoms lasted, what are the symptoms, what are the triggers, what can be done to avoid it, the role of vaccines, the role of boosters, all this stuff. But we have nothing like that.

    The NIH’s RECOVER initiative may answer some of these questions, but it will take months—if not years—for the U.S. to actually collect the comprehensive data on Long COVID that we should have started gathering when the condition first began gaining attention in 2020.

    Demographic data:

    In the testing section of the story, I mention that the U.S. doesn’t provide much demographic data describing who’s getting tested for COVID-19. There is actually a little-known provision in the CARES Act that requires COVID-19 testing providers to collect certain demographic data from all people who seek tests. But the provision is not enforced, and any data that are collected on this subject aren’t making it to most state COVID-19 dashboards, much less to the CDC’s public data dashboard.

    Here’s Dr. Ellie Murray, an epidemiologist at the Boston University School of Public Health, discussing why this is an issue:

    We don’t collect reason for seeking a test. We don’t collect age, race, ethnicity, occupation of people who seek a test. Those kinds of things could provide us with some really valuable information about who is getting tested, when, and why—that could help us figure out, what are the essential occupations where people are having a lot of exposures and therefore needing to get a lot of tests? Or are there occupations where we’re seeing a lot of people end up in hospital, who have those occupations, but they’re not getting tests, because actually, the test sites are nowhere near where they need to work, or they don’t have the time to get there before they close.

    And so we don’t really know who is getting tested, and that, I think, is a bigger problem, than whether the numbers that are being tested tell us anything about the trajectory of COVID. Because we have case data, and hospitalization data, and death data to tell us about the trajectory. And the testing could really tell us more about exposure, and concern, and access—if we collected some more of this data around who is getting tested and why.

    Test positivity:

    Speaking of testing: another metric that I didn’t get into much in the story was test positivity. Test positivity—or, the share of COVID-19 tests that return a positive result—has been used from the CDC to local school districts as a key metric to determine safety levels. (For more on this metric, check out my FAQ post from this past January.)

    But even when it’s calculated correctly, test positivity faces the same challenges as case data: namely, bias in who’s getting tested. Here’s Lauren Ancel Meyers, director of the University of Texas at Austin’s COVID-19 Modeling Consortium, explaining this:

    Test positivity is just as fraught [as cases]. It’s just as difficult, because you need to know the numerator and the denominator—what’s influencing the numerator and the denominator? Who is going to get tested, who has access to tests? … It used to be, at the very beginning [of the pandemic], nobody could get a test who wanted a test. And now, today, everybody has a test in their medicine cabinet, and they don’t get reported when they test. It’s different issues that have ebbed and flowed throughout this period.

    Often, if you’re a good data analyst or a modeler, and you have all the information, you can handle those kinds of biases. But the problem is, we don’t know the biases from day to day. And so even though there are statistical tools to deal with incomplete bias, without knowing what those biases are, it’s very hard to do reliable inference, and really hard to understand what’s actually going on.

    Genetic surveillance:

    Also related to testing: genetic surveillance for coronavirus variants of concern. Genetic surveillance is important because it can help identify new variants that may be more transmissible or more likely to evade protection from vaccines. It can additionally help track the qualities of concerning variants once they are identified (if variant data is linked to hospitalization data, vaccination data, and other metrics—which is not really happening in the U.S. right now.)

    Our current genetic surveillance systems have a lot of gaps. Here’s Leo Wolansky, from the Rockefeller Foundation’s Pandemic Prevention Institute (PPI), discussing how his organization seeks to address these challenges:

    [We’re trying to understand] where our blind spots are, and the bias that we might experience with a lot of health system reporting. One of the things that PPI has been doing is identifying centers of excellence in different parts of the world that can improve the sequencing of new cases in underrepresented countries. And so for example, we’ve provided quite a bit of support to the folks in South Africa that ultimately rang the alarm on Omicron.

    We’re also doing this by actually trying to systematically assess countries’ capacity for this type of genomic surveillance. So thinking about, how many tests have been recorded? What’s that test positivity rate? Do we have confidence in the basic surveillance system of the country? And then, do we also see enough sequences, as well as sequencing facility data, to demonstrate that this country can sequence and just isn’t doing enough—or cannot sequence because it needs foundational investment in things like laboratories and devices. We’ve been mapping this capacity just to make sure that we understand where we should be investing as a global community.

    The Pandemic Prevention Institute is taking a global perspective in thinking about data gaps. But these gaps also exist within the U.S., as is clear when one looks at the differences in published coronavirus sequences from state to state. Some states, like Wyoming, Vermont, and Colorado, have sequenced more than 10% of their cumulative cases, according to the CDC. Others, like Oklahoma, Iowa, and South Dakota, have sequenced fewer than 3%. These states need additional investment in order to thoroughly monitor coronavirus transmission among their residents.

    Cohort studies:

    In a cohort study, researchers follow a group of patients over time in order to collect long-term data on specific health conditions and/or the outside factors that influence them. The U.S. has set up a few cohort studies for COVID-19, but they haven’t been designed or utilized in a way that has actually provided much useful data—unlike cohort studies in some other countries. (The U.K., for example, has several ongoing cohort studies collecting information on COVID-19 symptoms, infections in schools, seroprevalence, and more.)

    Here’s Dr. Ellie Murray explaining the lost potential of these studies in the U.S.:

    There are a number of existing cohort studies that have been asked or who asked to pivot to collecting COVID information and therefore collecting long-term COVID information on their cohorts. But there doesn’t seem to be any kind of system to [determine], what are the questions we need answered about COVID from these kinds of studies? And how do we link up people who can answer those questions with the data that we’re collecting here, and making sure we’re collecting the right data? And if this study is going to answer these questions, and this one is going to answer those questions—or, here’s how we standardize those two cohorts so that we can pull them together into one big COVID cohort.

    And so, we end up in this situation where, we don’t know what percent of people get Long COVID, even though we’ve been doing this for over two years. We don’t even really know, what are all the different symptoms that you can get from COVID? … There are all these questions that we could be sort-of systematically working our way through, getting answers and using them to inform our planning and our response. [In addition to having] standardized questions, you also need a centralized question, instead of just whatever question occurs to someone who happens to have the funding to do it.

    Excess deaths:

    Excess deaths measure the deaths that occur in a certain region, over a certain period of time, above the number of deaths that researchers expect to see in that region and time period based on modeling from past years’ data. Excess deaths are the COVID-19 metric with the longest lag time: it takes weeks from initial infection for someone to die of the disease, and can take weeks further for a death certificate to be incorporated into the public health system.

    Once that death information is available, however, it can be used to show the true toll of the pandemic—analyzing not just direct COVID-19 deaths, but also those related to isolation, financial burden, and other indirect issues—as well as who has been hit the hardest.

    Here’s Cecile Viboud, a staff scientist at the NIH who studies infectious disease mortality, discussing this metric:

    We’ve been using the excess death approach for a long time. It comes from flu research, basically starting in 1875 in the U.K. And it was used quite a lot during the 1918 pandemic. It can be especially good in examining historical records where you don’t have lab confirmation—there was no testing ability back in those days…

    So, I think it’s kind of natural to use it for a pandemic like COVID-19. Very early on, you could see how useful this method was, because there was so little testing done. In March and April 2020, you see substantial excess, even when you don’t see lab-confirmed deaths. There’s a disconnect there between the official stats, and then the excess mortality… [We can also study] the direct effect of COVID-19 versus the indirect effect of the pandemic, like how much interventions affected suicide, opioids, death, accidents, etc. The excess approach is also a good method to look at that.

    Viboud also noted that excess deaths can be useful to compare different parts of the U.S. based on their COVID-19 safety measures. For example, one can analyze excess deaths in counties with low vaccination rates compared to those with high vaccination rates. This approach can identify the pandemic’s impact even when official death counts are low—an issue that the Documenting COVID-19 project has covered in-depth.

    Again, you can read the full FiveThirtyEight story here!

    More federal data