Category: Variants

  • CDC stepped up sequencing, but the data haven’t kept pace

    CDC stepped up sequencing, but the data haven’t kept pace

    If the U.S. does see a fourth surge this spring, one of the main culprits will be variants. Three months after the first B.1.1.7-caused case was detected in this country, that variant now causes about one third of new COVID-19 cases nationwide. The B.1.1.7 variant, first detected in the U.K., spreads more readily and may pose a higher risk of hospitalization and death.

    Meanwhile, other variants have taken root. There’s the variant that originated in California, B.1.427/B.1.429, which now accounts for over half of cases in the state. There’s the variant that originated in New York City, B.1.526, which is quickly spreading in New York and likely in neighboring states. And there’s the variant that originated in Brazil, P.1; this variant has only been identified about 200 times in the U.S. so far, but it’s wreaking havoc in Brazil and some worry that it may be only a matter of time before we see it spread here.

    The thing about viral variants—especially those more-transmissible variants—is, they’re like tribbles. They might seem innocuous at first, but if left to multiply, they’ll soon take over your starship, eat all your food, and bury you in the hallway. (If you didn’t get that reference, watch this clip and then get back to me.) The only way to stop the spread is to first, identify where they are, and then use the same tried-and-true COVID-19 prevention measures to cut off their lineages. Or, as Dr. McCoy puts it: “We quit feeding them, they stop breeding.”

    In the U.S., that first part—identify where the variants are—is tripping us up. The CDC has stepped up its sequencing efforts in a big way over the past few months, going from 3,000 a week in early January to 10,000 a week by the end of March. But data on the results of these efforts are scarce and uneven, with some states doing far more sequencing than others. New York City, for example, has numerous labs frantically “hunting down variants,” while many less-resourced states have sequenced less than half a percent of their cases. And the CDC itself publishes data with gaping holes and lags that make the numbers difficult to interpret.

    The CDC has three places you can find data on variants and genomic sequencing; each one poses its own challenges.

    First, there’s the original variant data tracker, “US COVID-19 Cases Caused by Variants.”  This page reports sheer numbers of cases caused by three variants of concern: B.1.1.7 (U.K. variant), B.1.351 (South Africa variant), and P.1 (Brazil variant). It’s updated three times a week, on Tuesdays, Thursdays, and Sundays—the most frequent schedule of any CDC variant data.

    But the sheer numbers of cases reported lack context. What does it mean to say, for example, the U.S. has about 12,500 B.1.1.7 cases, and 1,200 of them are in Michigan? It’s tricky to explain the significance of these numbers when we don’t know much sequencing Michigan is doing compared to other states.

    This dataset is also missing some pretty concerning variants: both the B.1.526 (New York) and B.1.427/B.1.429 (California) variants are absent from the map and state-by-state table. According to other sources, these variants are spreading pretty rapidly in their respective parts of the country, so there should be case numbers reported to the CDC—it’s unclear why the CDC hasn’t yet made those numbers public.

    (To the CDC’s credit, the California variant was recently reclassified as a “variant of concern,” and Dr. Walensky said at a press briefing this week that the New York variant is under serious investigation to get that same reclassification bump. But that seems to be a long process, as it hasn’t happened weeks after the variant emerged.)

    Second, there’s the variant proportions tracker, which reports what it sounds like: percentages, representing the share of COVID-19 cases that CDC researchers estimate are caused by different coronavirus variants. The page includes both national estimates and state-by-state estimates for a pretty limited number of states that have submitted enough sequences to pass the CDC’s threshold.

    I wrote about this page when it was posted two weeks ago, calling out the stale nature of these data and the lack of geographic diversity. There’s been one update since then, but only to the national variant proportions estimates; those numbers are now as of March 13 instead of February 27. The state numbers are still as of February 27, now over a month old.

    Note that Michigan—the one state everyone’s watching, the state that has reported over 1,000 B.1.1.7 cases alone—is not included in the table. How are we supposed to use these estimates when they so clearly do not reflect the current state of the pandemic?

    !function(){“use strict”;window.addEventListener(“message”,(function(a){if(void 0!==a.data[“datawrapper-height”])for(var e in a.data[“datawrapper-height”]){var t=document.getElementById(“datawrapper-chart-“+e)||document.querySelector(“iframe[src*=’”+e+”‘]”);t&&(t.style.height=a.data[“datawrapper-height”][e]+”px”)}}))}();

    A third variant-adjacent data page, added to the overall CDC COVID Data Tracker this past week, provides a bit more context. This page provides data on published SARS-CoV-2 sequences provided by the CDC, state and local public health departments, and other laboratory partners. You can see the sheer number of sequenced cases grow by week and compare state efforts.

    It’s pretty clear that some states are doing more sequencing than others. States with major scientific capacity—Washington, Oregon, New York, D.C.—are near the top. Some states with smaller populations are also on top of the sequencing game: Wyoming, Hawaii, Maine. But 32 states have sequenced fewer than 1% of their cases in total, and 21 have sequenced fewer than 0.5%. That’s definitely not enough sequences for the states to be able to find pockets of new variants, isolate those transmission chains, and stop the breeding.

    !function(){“use strict”;window.addEventListener(“message”,(function(a){if(void 0!==a.data[“datawrapper-height”])for(var e in a.data[“datawrapper-height”]){var t=document.getElementById(“datawrapper-chart-“+e)||document.querySelector(“iframe[src*=’”+e+”‘]”);t&&(t.style.height=a.data[“datawrapper-height”][e]+”px”)}}))}();

    Chart captions state that the state-by-state maps represent cases sequenced “from January 2020 to the present,” while a note at the bottom says, “Numbers will be updated every Sunday by 7 PM.” So are the charts up to date as of today, April 4, or are they up to date as of last Sunday, March 28? (Note, I put simply “March 2021” on my own chart with these data.)

    Obviously, the lack of date clarity is annoying. But it’s also problematic that these are cumulative numbers—reflecting all the cases sequenced during more than a year of the pandemic. Imagine trying to make analytical conclusions about COVID-19 spread based on cumulative case numbers! It would simply be irresponsible. But for sequencing, these data are all we have.

    So, if anyone from the CDC is reading this, here’s my wishlist for variant data:

    • One singular page, with all the relevant data. You have a COVID Data Tracker, why not simply make a “Variants” section and embed everything there?
    • Regular updates, coordinated between the different metrics. One month is way too much of a lag for state-by-state prevalence estimates.
    • Weekly numbers for states. Let us see how variants are spreading state-by-state, as well as how states are ramping up their sequencing efforts.
    • More clear, consistent labeling. Explain that the sheer case numbers are undercounts, explain where the prevalence estimates come from, and generally make these pages more readable for users who aren’t computational biologists.

    And if you’d like to see more variant case numbers, here are a couple of other sources I like:

    • Coronavirus Variant Tracker by Axios, providing estimated prevalence for four variants of concern and two variants of interest, along with a varants FAQ and other contextual writing.
    • CoVariants, a tracker by virologist Emma Hodcroft that shows variant spread around the world based on public sequencing data. Hodcroft posts regular updates on Twitter.
    • Nextstrain, an open-source genome data project. This repository was tracking pathogens long before COVID-19 hit, and it is a hub for sequence data and other related resources.

    The U.S. has blown past its current sequencing goal (7,000 cases per week), but is aiming to ramp up to 25,000—and has invested accordingly. I hope that, in addition to ramping up all the technology and internal communications needed for this effort, the CDC also improves its public data. The virus is multiplying; there’s no time to waste.

    Related posts

    • New CDC page on variants still leaves gaps

      New CDC page on variants still leaves gaps

      This week, the CDC published a new data page about the coronavirus variants now circulating in the U.S. The page provides estimates of how many new cases in the country may be attributed to different SARS-CoV-2 lineages, including both more familiar, wild-type variants (B.1. and B.1.2) and newer variants of concern.

      This new page is a welcome addition to the CDC’s library, as their “Cases Caused by Variants” page only provides numbers of variant cases reported to the agency—which, as we have repeatedly stated at the CDD, represent huge undercounts.

      However, the page still has three big problems:

      First, the data are old. The CDC is currently reporting data for four two-week periods, the most recent of which ends February 27. That’s a full three weeks ago—a pretty significant lag when several “variants of concern” are concerning precisely because they are more infectious, meaning they can spread through the population more quickly.

      The CDC’s B.1.1.7 estimate (about 9% as of Feb. 27) particularly sticks out. CoVariants, a variant tracker run by independent researcher Emma Hodcroft, also puts B.1.1.7 prevalence in the U.S. at about 10% in late February… but estimates this variant accounts for 22% of sequences as of March 8. These estimates indicate that B.1.1.7 may have doubled its case counts in the two weeks after the CDC’s data stop.

      Second, the CDC data reveal geographic gaps in our current sequencing strategy. The CDC is providing state-by-state prevalence estimates for 19 select states—or, those states that are doing a lot of genomic sequencing. Of course, this includes big states such as California and New York, but excludes much of the Midwest and other smaller, less scientifically-endowed states.

      Michigan, that state currently facing a concerning surge, is not represented—even though the state has one of the highest raw counts of B.1.1.7 cases, as of this week. We can gather from a footnote that Michigan did not submit at least 300 sequences to the CDC between January 13 and February 13; still, this exclusion poses a challenge for researchers watching that surge.

      And finally, the data are presented in a confusing manner. When I shared this page with a couple of COVID Tracking Project friends on Friday, it took the group a lot of close-reading and back-and-forth to unpack those first two problems. And we’re all used to puzzling through confusing data portals! The CDC claims this page is an up-to-date tracker, “used to inform national and state public health actions related to variants,” but its data are weeks old and represent less than half of the country.

      The CDC needs to improve its communication of data gaps, lags, and uncertainties, especially on such an alarming topic as variants. And, of course, we need better variant data to begin with. The U.S. is aiming to sequence 25,000 samples per week, but that’s still far from the 5% of new cases we would need to sequence in order to develop an accurate picture of variant spread in the U.S.

      On that note: you may notice that we now have a new category for variant posts on the CDD website. I expect that this will continue to be a major topic for us going forward.

      Related posts

      • Featured sources, March 14

        • Helix COVID-19 Surveillance Dashboard: Helix, a population genomics company, is one of the leading private partners in the CDC’s effort to ramp up SARS-CoV-2 sequencing efforts in the U.S. The company is reporting B.1.1.7 cases identified in select states, along with data on a mutation called S gene target failure (or SGTF) that scientists have found to be a major identification point in distinguishing B.1.1.7 from other strains.
        • COVID-19 related deaths by occupation, England and Wales: This is another source that I used for my Pop Sci story. The U.S. doesn’t publish any data connecting COVID-19 cases or deaths to occupations, but the U.K. data falls along similar lines to what we’d expect to see here: essential workers have been hit hardest. Men in “elementary occupations,” a class of jobs that require some physical labor, and women in service and leisure occupations have the highest death rates.
        • The Impact of the COVID-19 Pandemic on LGBT People: This brief from the Kaiser Family Foundation addresses a key data gap in the U.S.; the national public health agencies and most states do not publish any data on how the pandemic has specifically hit the LGBTQ+ community. KFF surveys found that a larger share of LGBTQ+ adults have experienced job loss and negative health impacts in the past year, compared to non-LGBTQ+ adults.

      • NYC variant looks like bad news

        In a press conference on Wednesday, NYC mayor Bill de Blasio confirmed that the recently identified NYC variant (since christened B-1526) is outpacing the original strain in spreading speed, and his senior advisor for Public Health, Dr. Jay Varma, said that these two variants combined account for 51% of all cases in the city.  This is coming from a preliminary analysis, and so far, they have not found that B-1526 is more deadly or that it may evade vaccine efficacy. However, it’s still worrying.

        It’s probably contributing to the relatively slower pace of decline in cases in NY versus the rest of the country: 

        And this comes when NYC is increasing indoor dining capacity to 50%, and when NY is going to scrap its rule on people from out of state having to quarantine on April 1. De Blasio has told New Yorkers to stay the course, but the people in charge (Andrew Cuomo) don’t seem to want to follow that advice.

      • Featured sources, March 7

        • Coronavirus variant data from USA TODAY: The CDC doesn’t publish a time series of its counts of COVID-19 cases caused by variants. So, USA TODAY journalists have set up a program to scrape these data whenever the CDC publishes an update and store the data in a CSV, including variant counts for every U.S. state. The time series goes back to early January.
        • Documenting COVID-19: This repository is one of several great resources brought to my attention during this past week’s NICAR conference. It’s a database of documents related to the pandemic, obtained through state open-records laws and the Freedom of Information Act (FOIA). 246 records are available as of February 26.
        • VaccinateCA API: California readers, this one’s for you. The community-driven project VaccinateCA, aimed at helping Californians get vaccinated, has made its underlying data available for researchers. The API includes data on vaccination sites and their availability across the state.

      • Featured sources, Feb. 21

        • Bloomberg’s COVID-19 Vaccine Tracker: We’ve featured Bloomberg’s tracker in the CDD before (in fact, you can read Drew Armstrong’s walkthrough of the dashboard here), but it’s worth highlighting that the Bloomberg team made two major updates this week. First, they added a demographic vertical, which includes race and ethnicity data for the U.S. overall and for 27 states that are reporting these data. This vertical will be updated weekly. Second, the team made all of their data available on GitHub! I, for one, am quite excited to dig through the historical figures.
        • CoVariants: This new resource from virus tracker Dr. Emma Hodcroft provides an overview of SARS-CoV-2 variants and mutations. You can explore how variants have spread across different parts of the world through brightly colored charts. The resource is powered by GISAID, Nextstrain, and other sequencing data; follow Dr. Hodcroft on Twitter for regular updates.
        • The Next Phase of Vaccine Distribution: High-Risk Medical Conditions (from KFF): The latest analysis brief from the Kaiser Family Foundation looks at how individuals with high-risk medical conditions are being prioritized for vaccine distribution in each state. KFF researchers compared each state’s prioritization plans to the CDC’s list of conditions that “are at increased risk” or “may be at an increased risk” for severe illness due to COVID-19; the analysis reflects information available as of February 16.
        • First Month of COVID-19 Vaccine Safety Monitoring (CDC MMWR): This past Friday, the CDC released a Morbidity and Mortality Weekly Report with data from the first month of safety monitoring, using the agency’s Vaccine Adverse Event Reporting System (or VAERS). Out of the 13.8 million vaccine doses administered during this period, about 7,000 adverse events were reported—and only 640 were classified as serious. Check the full report for figures on common side effects and enrollment in the CDC’s new v-safe monitoring program.

      • Some optimistic vaccine news but variants still pose a major threat

        Some optimistic vaccine news but variants still pose a major threat

        Last week, Janssen, a pharmaceutical division owned by megacorp Johnson & Johnson, released results for its phase 3 ENSEMBLE study. The Janssen vaccine uses an adenovirus vector (a modified common cold virus that delivers the DNA necessary to make the coronavirus spike protein), can be stored at normal fridge temperatures, and only requires one dose. Here’s a table of the raw numbers from Dr. Akiko Iwasaki of Yale:

        At first glance it does look like it’s “less effective” than the mRNA vaccines from Moderna and Pfizer. But, when you look at the severe disease, there’s a 100% decrease in deaths. No one who got the J&J vaccine died of coronavirus, no matter where they lived— including people who definitely were diagnosed with the South African B.1.351 variant. Here’s how that compares with the Moderna, AstraZeneca, Pfizer, and Novavax vaccines, per Dr. Ashish Jha of Brown:

        Nobody who got any of the vaccine candidates was hospitalized or died from COVID-19. That’s huge, especially as variants continue to spread across the U.S. (Here’s the updated CDC variant tracker.)

        J&J’s numbers are especially promising when it comes to variant strains. Moderna and Pfizer released their results before the B.1.1.7 (U.K.) or B.1.351. (S.A.) variants reached their current notoriety, which makes J&J’s overall efficacy numbers look worse by comparison. But the fact that no one who got the J&J vaccine was hospitalized no matter which variant they were infected with is a cause for optimism. (B.1.351 is the variant raising alarms for possibly being able to circumvent a vaccine’s protection due to a helpful mutation called E484K. A Brazilian variant, P1, also has this mutation, though there’s not a lot of research on vaccine efficacy for this particular mutant.)

        It also means that vaccination needs to step up. While it may seem counterintuitive to step up vaccinations against variants that can supposedly circumvent them, it’s important to note that there still was a significant decrease in COVID-19 cases in vaccinated patients from South Africa. A 57% drop compared with the 95% prevalence of the B.1.351 still suggests that vaccination can prevent these cases, and thus can seriously slow the spread of the variant.

        What does all of this mean for COVID-19 rates? We can infer a few things. For starters, when vaccines are distributed to the general public around April or May, we may see hospitalization rates and death rates drop more than positive test rates. Positive test rates should obviously drop too, but they’ll probably stay at least a little higher than hospitalizations and death rates for a while.

        Second, it means that we really need to ramp up sequencing efforts in the U.S.. We need more data to tell us just how well these vaccines can protect against the spreading variants, but we can’t collect that data if we don’t know which strain of SARS-CoV-2 someone gets. We here at the CDD have covered sequencing efforts – or lack thereof – before, but the rollout has still been painfully slow. CDC Director Rochelle Walensky stressed that “we should be treating every case as if it’s a variant during this pandemic right now,” during the January 29 White House coronavirus press briefing. But the 6,000 sequences per week she’s pushing for as of the February 1 briefing should have been the benchmark months ago. We’re still largely flying blind until we can get our act together.

        Some states in particular may be flying blinder than others. As Caroline Chen wrote in ProPublica yesterday, governors of New York, Michigan, Massachusetts, California, and Idaho are planning to relax more restrictions, including those on indoor dining. Such a plan is probably the perfect way to ensure these variants spread, so much that even Chen was surprised at how pessimistic the outlook was when she asked 10 scientists for the piece.

        The B.1.1.7 variant is expected to become the dominant strain in the U.S. by March, according to the CDC. And on top of that, the B.1.1.7 variant seems to have picked up that helpful E484K mutation in some cases as well. Per Angela Rasmussen of Georgetown University, if these governors don’t realize how much they’re about to screw everything up, “the worst could be yet to come.” God help us.

      • We’re not doing enough sequencing to detect B.1.1.7

        We’re not doing enough sequencing to detect B.1.1.7

        The CDC has identified 63 cases of the B.1.1.7 variant as of Jan. 8, but this is likely a significant undercount thanks to the nation’s lack of systematic sequencing.

        A new, more transmissible strain of COVID-19 (known as B.1.1.7) has caused quite a stir these past few weeks. It surfaced in the United Kingdom and has been detected in eight states: California, Colorado, Connecticut, Florida, Georgia, New York, Texas, and Pennsylvania. The fact that a mutant strain happened isn’t a surprise, as RNA viruses mutate quite often. But as vaccines roll out, the spread of a new strain is yet another reminder that we’re nowhere near out of the woods yet.  

        It’s entirely possible to differentiate between strains of SARS-CoV-2 through genetic testing. To detect the B.1.1.7 variant, COVID-19 positive samples can be sequenced to search for a telltale deletion in the virus’s RNA. And in theory, we could track the spread of this variant with good testing data. A truly robust tracking effort should include a centralized surveillance program to sequence the RNA of the SARS-CoV-2 virus in all positive cases—or at least a good sample—to detect any mutant strains and track their impact. However, this is an area where the US has consistently faltered: as of December 23rd, only 51,212 out of 18 million positive cases had been sequenced. 

        As with most of the government’s response, handling this seems to be mostly up to the states. According to releases from Colorado, Pennsylvania, Connecticut, and Texas, it looks like these states are making sequencing efforts. Georgia said, “The variant was discovered during analysis of a specimen sent by a pharmacy in Georgia to a commercial lab”, which I can only assume means they have been conducting some kind of sequencing effort. I couldn’t find references to the extent of sequencing efforts in the announcements from California, Florida, or New York

        From these releases, it’s obvious that there is no unified cross-state effort. Pennsylvania stated that they had been sending “10-35 random samples biweekly to the CDC since November to study sequencing,” but that’s not going to be nearly enough to track this more transmissible variant. Are there any plans to ramp up sequencing? And that’s just from Pennsylvania because they deigned to tell us—are all states going to ramp up sequencing? It’s just not clear. 

        And after all that, starting to test for the variant now still won’t tell us just how widespread it is. The first case in New York was in someone with no evident travel history. Indeed, this is true for most people who have been infected, and, per Dr. Angela Rasmussen in Buzzfeed News, this suggests that the variant is already circulating in the community. To know how widespread the variant is, we would need to retroactively test samples that had already tested positive. Colorado’s press release mentioned that they would be doing some retroactive testing, but what about the other seven states? 

        Plus, that’s just states with already confirmed cases—there absolutely will be more confirmed cases in other states, because if it is already present in the community, there probably already are cases in other states. To know just where this variant is, every positive test in the US stretching back months into the past would have to be retroactively re-tested for the variant—an unlikely occurrence. 

        Even if there were a coordinated effort to retroactively sequence all positive tests, some cases of the variant could still slip through the cracks, because most states still aren’t doing enough PCR testing as it is. As of January 8th, according to Ashish Jha’s team at the Brown University School of Public Health, 86% of states aren’t meeting their testing targets. (Meeting testing targets indicates that enough testing is happening to “identify most people reporting symptoms and at least two of their close contacts.” State targets on this dashboard were last configured on October 1, so keep that in mind.) Only two states where the variant has surfaced, Connecticut and New York, are meeting their targets—and cases are surging in both states right now. Longtime readers are going to be very familiar with this problem, but if any new people are reading, this means that in most states we don’t even know how widespread our “garden variety” COVID-19 is. So how are we supposed to know where the UK variant is if we can’t even keep track of the virus that’s been here for almost a year? 

        Beyond testing, even reporting on confirmed cases of the variant is spotty at best. The CDC is reporting how many detected cases of COVID-19 have been caused by the variant, but no state with a confirmed case caused by B.1.1.7 is displaying that data on their dashboard. (I checked the 8 states’ dashboards and left a comment on California’s because the ask box was right there.) Why is this not on their dashboards? I couldn’t tell you, but it seems like important information that should be reported.

        All of these unanswered questions show, yet again, that we desperately need a unified effort from the federal government to track and combat this virus. It should not be this hard to find how we’re tracking the spread of this variant, it should not be this hard to tell which methods work for even identifying the variant, and it should at least be possible to find this data on state health dashboards. It might look like we’re close to the finish line as vaccines continue to be distributed, but we’re tripping over the exact same problems we did at the beginning.