Author: Betsy Ladyzhets

  • No, we’re not done talking about HHS hospitalization data

    The HHS is still collecting and publishing COVID-19 hospitalization data, and I, personally, feel as though I know both more and less than I did when I wrote last week’s newsletter. This week’s issue is already rather long, so here, I will focus on outlining the main questions I have right now.

    Why are HHS’s COVID-19 hospitalization numbers higher than states’? While HHS’s most public-facing dataset is the HHS Protect hospital utilization dataset, last updated on July 23, the department also reports daily counts of the hospital beds occupied in every state. This dataset includes counts of all currently hospitalized patients with confirmed and suspected COVID-19. Local public health departments in all 50 states and D.C. also report the same datapoint; the COVID Tracking Project collects, standardizes, and reports these local counts daily.

    According to analysis by the COVID Tracking Project, over the week of July 20 to July 26, HHS reported an average of 24% more hospitalized COVID-19 patients across the U.S. than the states did. Figures for some states show even more variation. In Florida, for example, HHS’s count nearly doubled from July 26 to July 27 (from about 11,000 patients to about 21,500 patients). The state reported about 9,000 hospitalized COVID-19 patients both days.

    In Arkansas, meanwhile, the state has reported about 500 hospitalizations each day for the past week, while HHS has reported about 1,600. Overall, for 28 out of 53 states and territories, there is at least one day in the past week when HHS’s count of currently hospitalized COVID-19 patients is at least 50% higher than the state public health department’s count.

    The COVID Tracking Project suggests several potential reasons for this discrepancy. Some hospitals may report to HHS, but not to their state public health departments, either because they are federally-run hospitals (such as hospitals run by the Veteran’s Association) or because HHS’s tie to federal supplies such as remsidivir provides a greater incentive for complete reporting. State definitions for who counts as a COVID-19 patient differ from place to place, and may be narrower than the federal categorization, which includes all confirmed and suspected cases. And some hospitals might also be inputting data entry errors or double-counting their patient numbers as they adjust to the new reporting system. As I noted in last week’s issue, we do not know how HHS is screening for and removing data entry errors in their dataset.

    How did the CDC-to-HHS switch impact local public health departments? The COVID Tracking Project’s blog post on hospitalization data also explains that several states had delays or errors in reporting current hospitalization numbers because the states previously relied on the CDC’s database for these values. Public health departments in Idaho, Missouri, South Carolina, Wyoming, Texas, and California have all documented issues with compiling hospitalization data at the state level thanks to the CDC-to-HHS system change. Similar issues may be going unreported in other states.

    As I described last week, changing database systems in the middle of a pandemic can be particularly challenging for already-overburdened hospitals. It can take multiple hours a day to enter data into both HHS and state reporting systems, and that’s on top of the technological and bureaucratic hurdles that hospitals must clear. Public health departments are scrambling to help their hospitals, as hospitals are scrambling to report the correct data—to say nothing of actually taking care of their patients.

    Why should I trust a database built by a tech company that got the job through suspicious means? According to an investigation by NPR, TeleTracking Technologies received its federal contract to build HHS’s data system for collecting hospital data under some unusual circumstances. For one thing, HHS claimed that TeleTracking’s contract was won through competitive bidding, but none of 20 competitors contacted by NPR knew about this opportunity. For another, the process HHS used to award that contract is typically used for scientific research and new technology, not database building. And finally, Michal Zamagias, TeleTracking’s CEO, is a real estate investor and long-time Republican donor with ties to the Trump Organization.

    Rep. Clyburn—you know, that chair of the congressional coronavirus subcommittee—has launched an investigation into TeleTracking and its CEO. Other Congressmembers are asking questions, too. I, for one, am excited to see what they find.

  • “Is Dr. Anthony Fauci on Cameo?”

    “Is Dr. Anthony Fauci on Cameo?”

    NIAID Director Dr. Anthony Fauci testifies before House Select Subcommittee on the Coronavirus Crisis on July 31. Screenshot retrieved from the hearing’s livestream.

    In the most recent episode of comedy podcast My Brother, My Brother and Me (approx. timestamp 23:50), youngest brother Griffin McElroy solemnly asks, “Is Dr. Anthony Fauci on Cameo?”

    McElroy’s question, asked in the context of a rather silly and unscientific discussion on contaminated basketballs, refers to a video-sharing service in which fans can pay celebrities to send personalized messages. Dr. Fauci is, of course, not on Cameo. But he did make a public appearance this past Friday: he testified before the House Subcommittee on the Coronavirus Crisis. This was Dr. Fauci’s first Congressional appearance in several weeks; Democrats have claimed that the White House blocked him from testifying earlier in the summer.

    Dr. Fauci was joined on the witness stand by Centers for Disease Control and Prevention (CDC) Director Dr. Robert Redfield and Assistant Secretary for Health Admiral Brett Giroir, who leads policy development at the Department of Health and Human Services (HHS). All three witnesses answered questions about their respective departments, covering COVID-19-related topics from test wait times to the public health implications of Black Lives Matter protests.

    For comprehensive coverage of the hearing, you can read my Tweet thread for Stacker:

    But here, I will focus on five major takeaways for the COVID-19 data world.

    First: the results of scientific studies on the pandemic are publicly shared. In his opening statement, Dr. Fauci cited four top priorities for the National Institute of Allergy and Infectious Diseases (NIAID): improving scientific knowledge of how the novel coronavirus works, developing tests that can diagnose the disease, characterizing and testing methods of treating patients, and developing and testing vaccines. The Congressmembers on the House subcommittee were particularly interested in this last priority; Dr. Fauci reassured several legislators that taking vaccine development at “warp speed” will not come at the cost of safety.

    Rep. Jackie Walorski, a Republican from Indiana, was especially concerned about Chinese interference in vaccine development. She repeatedly asked Dr. Fauci if he believed China was “hacking” American vaccine research, and if he believed this was a threat to the progress of such work. Dr. Fauci replied that all clinical results from NIAID work are shared publicly through the usual scientific process, to invite feedback from the greater medical community.

    Clinical studies in particular are listed in a National Institutes of Health (NIH) database called ClinicalTrials.gov. On this site, any user can easily search for studies relating to COVID-19; there are2,844 listed at the time I send this newsletter256 of these studies are marked as “completed,” and two of those have results posted. I see no reason to doubt that, if Rep. Walorski were to visit this database in the coming months, she would find the results of vaccine trials here as well.

    Dr. Fauci also publicized the COVID-19 Prevention Network, a website on which Americans can volunteer for vaccine trials. According to Dr. Fauci, 250,000 individuals had registered by the time of the hearing.

    Second: nursing homes are getting COVID-19 antigen tests, big time. Dr. Redfield, Admiral Giroir, and several of the House representatives at the hearing highlighted a recent initiative by HHS to distribute rapid diagnostic COVID-19 tests to nursing homes in hotspot areas. In his opening remarks, Dr. Redfield stated that, by the end of this week, federal health agencies will have delivered “nearly one million point-of-care test kits to 1,019 of the highest risk nursing homes, with 664 nursing homes scheduled for next week.”

    The tests being distributed identify antigens, protein fragments on the surface of the novel coronavirus. Like polymerase chain reaction (PCR) tests, antigen tests determine if a patient is infected at the time they are tested; unlike PCR tests, they may be produced and distributed cheaply, and return results in minutes. Antigen tests have lower sensitivity, however, meaning that they may miss identifying patients who are in fact infected.

    The antigen test distribution initiative is great news for the nursing homes across the country that will be able to test and treat their residents more quickly. But from a data perspective, it poses one major question: how will the results of these tests be reported? While antigen tests may be diagnostic, their results should not be lumped in with PCR test results because they have a different accuracy level and serve a different purpose in the pandemic.

    The Nursing Home COVID-19 Public File, a national dataset run by the Center for Medicare and Medicaid Services, reports “confirmed” and “suspected” COVID-19 cases in the nation’s nursing homes. The dataset does not specify what types of tests were used to identify these cases, or the total tests conducted in each home. Similarly, state-reported datasets on COVID-19 in nursing homes typically report only cases and deaths, not testing numbers. And, as of the most recent COVID Tracking Project analysis, the only state currently reporting antigen tests in an official capacity is Kentucky. But more states may be including antigen test numbers in their counts of “confirmed cases” or “molecular tests,” as several states lumped PCR and serology tests this past spring. As hundreds of nursing homes across the country begin to use the antigen tests so graciously distributed by the federal government, we must carefully watch to identify where those numbers show up.

    Third: Admiral Giroir doesn’t know what data his agency publishes.

    If you watch just five minutes from Friday’s hearing, I highly recommend the five minutes in which Rep. Nydia Velázquez (a Democrat from New York) interrogates Admiral Giroir about COVID-19 test wait times. Here’s my transcript of a key moment in the conversation:

    Rep. Velázquez: Dr. Redfield, I’d like to turn to you. Does the CDC have comprehensive information about the wait times for test results in all 50 states?

    Dr. Redfield: I would refer that question back to the Admiral.

    Rep. Velázquez: Sir?

    Admiral Giroir: Yes, we have comprehensive information on wait times in all 50 states, from the large, commercial labs.

    Rep. Velázquez: And do you publish this data? These data?

    Admiral Giroir: Uh… we talk about it. Always. I mean, I was on… I was with 69 journalists yesterday, and we talk about that frequently.

    He went on to claim that decisionmakers at the state and city level have data on test wait times from commercial labs. But where are these data? HHS has collected testing data since the beginning of the pandemic; these data were first published on a CDC dashboard in early May and are now available on HealthData.gov.

    The HealthData.gov dataset includes test results from CDC labs, commercial labs, state public health labs, and in-house hospital labs. For each test, the dataset includes geographic information, a date, and the test’s outcome. It does not include the time between the test being administered and its results being reported to the patient. In fact, that “date” can either be a. the date the test was completed, b. the date the result was reported, c. the date the specimen was collected, d. the date the test arrived at a testing facility, or e. the date the test was ordered. So, if there’s another, secret dataset which includes more precise dating, I personally would love to see it made public.

    Also, who are those 69 journalists, Admiral Giroir? How do I join those ranks? I have some questions about HHS hospitalization data.

    Fourth: everyone wants to reopen schools. Dr. Redfield said, opening schools is “in the best public health interest of K-12 students.” Dr. Fauci said, schools should reopen so that schools can access health services, teachers can identify instances of child abuse, and to avoid “downstream unintended consequences for families.” Rep. Steve Scalise, the subcommittee’s Ranking Member (and a Republican from Louisiana, home to one of the country’s most annoying COVID-19 dashboards), said, “Don’t deny these children the right to seek the American dream that everybody else has deserved over the history of our country.” Rep. James Clyburn, the subcommittee’s Chair (a Democrat from South Carolina), said that school reopening must not be a “one size fits all approach,” but it should be done for the good of students and their families.

    Clearly, reopening schools is a popular political opinion. But does the country have the data we need to determine if schools can reopen safely? Reopening, as Dr. Fauci explained in response to an early question from Rep. Clyburn, is most safely done when COVID-19 is no longer circulating widely in a community. School districts can determine whether the disease is circulating widely through looking at case counts over time, but for those case counts to be accurate, the region must be doing enough testing and contact tracing to catch all cases.

    And testing data, while they are certainly collected at the county and zip code levels by local public health departments, are not standardized at all. HHS doesn’t publish county-level testing data. Nor does the COVID Tracking Project. This lack of standardization for any geographic region smaller than a state is troubling, as public health leaders and journalists alike cannot currently assess the scope of local outbreaks with any kind of broad comparison. To put it simply: I would love to do a story on how many school districts can safely reopen right now, based on their case counts and test metrics. But the data I would need to do this story do not exist.

    Fifth: all data are political; COVID-19 data are especially political. I know, I know. Data have been political since humans started collecting them. One of America’s most comprehensive data sources, the U.S. Census, started as a way to enforce the Three-Fifths Compromise.

    But watching this Friday’s hearing hammered home for me how the mountains of data produced by this pandemic, coupled with the complete lack of standards across the institutions producing them, has made it particularly easy for politicians to quote random numbers out of context in order to advance their agendas. Rep. Clyburn said, “At least 11 states… are currently performing less than 30% of the tests they need to control the virus.” (Which states? How many tests do they need to perform? Where di that benchmark come from? What other metrics should the states be following?) And, on the other side of the aisle, Rep. Scalise held up a massive stack of paper and waved it right at the camera, claiming that the high number of tests that have been conducted in this country is evidence of President Trump’s national plan. (But how many tests have we conducted per capita? What are the positivity rates? What statistics can we actually correlate to President Trump’s plan?)

    In fact, after the hearing, the White House put out a press release claiming that America has “the best COVID-19 testing system in the world.” The briefing includes such claims as, “the U.S. has already conducted more than 59 million tests,” and, “the Federal Government has distributed more than 44 million swabs and 36 million tubes of media to all 50 States.” None of the statistics in the briefing are put into terms reflecting how many people have actually been tested, compared to the country’s total population. And none of the statistics are contextualized with public health information on what targets we should be meeting to control the pandemic.

    The experts who might have been consulted on that brief—Dr. Fauci, Dr. Redfield, and Admiral Giroir—all sat before Congressional Representatives on Friday morning, quietly nodding when Representatives asked if their respective departments were doing everything possible to protect America. If they had answered otherwise, they may not have returned for future hearings. The whole thing felt very performative to me: the Democrats threw veiled jibes at President Trump, the Republicans bemoaned China and Black Lives Matter protests, and Dr. Fauci fact-checked such basic statements as, “Children are not immune to COVID-19.”

    And almost everyone in the room—including all three witnesses—removed their mask when they spoke.

    If Dr. Fauci were available to commission on the video service Cameo, I would pay him good money to send a personal message to every Congressmember on that subcommittee telling them, confidentially, exactly what he thinks of their questions. And then I would ask him for Admiral Giroir’s personal cell phone number.

  • Which COVID numbers you should pay attention to, actually

    My last big story for this week is to heavily recommend this ProPublica feature by Caroline Chen and Ash Ngu on how to navigate COVID-19 data. Chen is a veteran health journalist who has been reporting on COVID-19 since January (and who reported on previous disease outbreaks before that). Her story explains how to understand test positivity rates, data lags, and the inherent uncertainty that comes with any attempt to quantify this pandemic.

    You should really read the full story, but I’ll summarize the main points for you here in case you’re just going to bookmark it for later:

    • Test positivity rates indicate the share of COVID-19 tests in a region which are coming back positive. If the rate is high (above 10%), this may mean only sick people have access to tests, and testing is not occurring widely enough to fully capture the scale of an outbreak. If the rate is low (below 5%), this may mean anyone who wants a test can get one, and epidemiologists will be able to quickly identify and trace new outbreaks.
    • Daily case counts often are not a good indicator of how a region’s outbreak is progressing, because counts of new cases may be undercounted on weekends or during testing delays. For a more accurate picture, look at the seven-day rolling average—a figure that averages a particular day’s number of new cases with the numbers of the six previous days. Also, rises in deaths tend to lag rises in cases by several weeks, reflecting the progression of the disease in COVID-19 patients.
    • It is difficult to state definitively whether a certain event—such as a restaurant opening or a protest—impacted COVID-19 spread in an area. No one event occurs in a vacuum, and any resulting data around that event were likely impacted by testing lags, testing availability, and other factors.
    • Don’t just look at one statistic; look at the whole picture. Ask whether case counts are rising in your area, yes, but also ask: are enough people getting tested? Are the hospitals filling up? How does your state or county compare to others nearby?
    • Find and follow sources you trust to help you interpret data as they are released. A good source will advise you in the areas where they have expertise and let you know when a question is out of their wheelhouse.
  • COVID source callout: West Virginia

    I have issues with West Virginia’s race data.

    First, West Virginia insists on reporting COVID-19 cases assigned to racial categories which do not exist. Two weeks ago, this was a category labeled, “Asian; Black or African American; White.” Last week, this was a category labeled, “Native Hawaiian or Other Pacific Islander; White.” The categories are particularly curious because WV usually only reports their cases according to three race categories: White, Black, and Other.

    (These extra categories have since disappeared from WV’s COVID Dashboard.)

    Relatedly, WV’s race data for cases is listed in a rather unintuitive location on the state’s dashboard: on a page labeled “County Summary.” If you did not look closely, you would think they weren’t reporting demographic data at all.

    And finally: WV used to report demographic information for deaths due to COVID-19 which occurred in the state. This information has not been reported since May 20. Sure, WV’s outbreak has been relatively small (with a total of 5,887 cases and 103 deaths as of July 26), but this is no excuse for failing to report the impacts of this outbreak on marginalized communities. According to CRDT figures, Black West Virginians make up 4% of the state’s population, but comprise 8% of its COVID-19 cases. To present a complete picture, the state should report death counts as well as the impacts of COVID-19 on other racial groups.

  • Featured sources, July 26

    These sources, along with all others featured in forthcoming weeks, are included in the COVID-19 Data Dispatch resource list.

    • The COVID Racial Data Tracker, by the COVID Tracking Project: COVID-19 is killing Black Americans at 2.5 times the rate of white Americans. The COVID Racial Data Tracker (or CRDT) keeps tabs on this disparity and others by collecting case and death counts, broken down by race and ethnicity, from state COVID dashboards. Our dataset is updated twice a week. And I say “our” because I work on this dataset; I’m happy to answer questions about it (betsyladyzhets@gmail.com).
    • Excess deaths associated with COVID-19 (U.S.): One dataset which the CDC hasn’t stopped publishing is a tally of the death toll in the U.S., including deaths which may be directly or indirectly related to the pandemic but have not been reported due to insufficient testing. The dataset is updated weekly, and you can see figures broken down by state and different demographic factors.
    • Excess deaths associated with COVID-19 (international)The Economist compiles a similar dataset to the CDC, tracking excess deaths in countries and cities around the world. You can read about and see visualizations based on these data here.
  • Public health experts call for COVID-19 data standardization

    The U.S. urgently needs better standards for COVID-19 data at national, state, and local levels, argues Resolve to Save Lives, a nongovernmental initiative run by the global health organization Vital Strategies. Resolve is led by President and CEO Dr. Tom Frieden, a former Director of the CDC; he worked with other public health experts on a report which reviewed the availability of COVID-19 data in the U.S.

    According to Resolve’s report, only 40% of “essential data points” for monitoring COVID-19 are publicly made available by federal and state sources. These data points include new confirmed and probable cases, the share of new cases linked to another new case (through known outbreak sites and contact tracing), and hospitalization per capita rates. Moreover, state dashboards are so disparate in their information presented and functionality that it is incredibly difficult to compare key metrics and get a full picture of the national outbreak.

    As a volunteer who works on data quality for the COVID Tracking Project, I am intimately familiar with this problem, but Dr. Frieden describes it better than I do:

    The lack of common standards, definitions, and accountability reflects the absence of national strategy, plan, leadership, communication, or organization and results in a cacophony of confusing data. By tracking essential metrics publicly in all states, we can build the transparency and accountability essential to make progress.

    Check out Resolve’s report on essential indicator availability by state to see where your state stands, and then, in a free moment between calling your government representatives, call your public health department and insist that they do better.

  • Hospital capacity dataset gets a makeover

    Hospital capacity dataset gets a makeover

    Screenshot retrieved from the HHS Protect Public Data Hub on July 26, 2020.

    On July 14, the White House announced that hospitals across America would no longer report their COVID-19 patient numbers and supply needs to the Centers for Disease Control and Prevention (CDC). Instead, they would report numbers through a data portal set up in April by the Department of Health & Human Services (HHS). A July 10 guidance issued by HHS requests that hospitals send reports on how many overall patients they have, how many COVID-19 patients they have, the status of those patients, and their needs for crucial supplies such as PPE and remdesivir.

    In some ways, this switch actually makes sense: HHS’ data portal, built by a contractor called TeleTracking, is designed specifically to support more efficient data collection during COVID-19. HHS was already collecting hospitalization data second-hand through state reports, some hospital-to-HHS reports, and the CDC’s old system, called the National Healthcare Safety Network; the new system is more streamlined at the federal level. HHS is also the primary federal entity collecting data on COVID-19 lab test results, through reports that go directly from laboratories to HHS (often bypassing local and state public health departments).

    Simplifying data collection to one office—just HHS, rather than HHS and CDC—should theoretically make it easier for hospitals to report their needs and receive aid from the federal governmentquickly. But switching systems during the middle of a pandemic is dangerous. Switching systems during a COVID-19 surge in the Sun Belt when hospitals are being pushed to their full capacity is especially dangerous. Hospital databases, once set up to report to the CDC, must be reconfigured—or worse, exhausted healthcare workers must manually enter their numbers into the new system.

    STAT News’ Nicholas Florko and Eric Boodman explore this issue in more detail, but here is one quote from John Auerbach, president and CEO of Trust for America’s Health, which summarizes the problem:

    Hospitals are incredibly varied across the country in terms of their capacity to report data in a timely and accurate way. If you’re going to say every hospital, regardless of its size, its resources, its capacity, has to learn a new system quickly, it’s problematic.

    It is inevitable that, for the first few weeks of this new system, any hospital capacity data reported by HHS will be rife with errors. And yet, public health leaders, researchers, and people simply living in Texas and Florida need to know how their hospitals are doing right now, so HHS has published the results of their new reporting system only a week after the ownership shift. The new website HHS built to publish these data, called the HHS Protect Public Data Hub, went live this past Monday, July 20. (Veteran users noted that this page copied the homework of the dataset’s former home on the CDC website—same color scheme and everything.)

    As I send this newsletter, the HHS Protect dataset was most recently updated on Thursday, July 23 with data as of the previous day. Experts looking at these data, including my fellow volunteers at the COVID Tracking Project, quickly noticed that something seemed off:

    You read that right: according to HHS Protect, 118% of Rhode Island’s hospital beds are currently occupied. As are 123% of its intensive care beds. And that’s just an extreme example; when one compares the hospital capacity estimates in this HHS update to the most recent estimates from the CDC’s system (dated July 14), only 6 states do not show changes of at least 20%. New Mexico, for example, has supposedly seen its number of COVID-19 patients skyrocket 265% in eight days’ time.

    Yes, the HHS system is collecting figures from about 1,500 more hospitals than the CDC system did. And yes, 21 states are currently listed as having “uncontrolled spread” by public health research groupCOVID Exit Strategy. But hospitalization figures typically rise slowly, with a slight delay from cases; for journalists like myself who have been looking at this data point for months, the jump reported by HHS is simply not reasonable.

    It’s good news for journalists and public health leaders that hospital capacity data is once again publicly available from a standardized, federal source. But I have a lot of questions for HHS. What is the agency doing to support already-taxed hospitals that do not have the staff or resources to transfer their database systems? When hospitals inevitably submit their data with errors, what protocols are in place to catch these issues and ensure all data going out to the public portal is accurate? How will the new system support state public health departments, such as Missouri and South Carolina, that previously relied on the CDC for their hospitalization figures? Will HHS make other datasets available on the HHS Protect portal (such as lab data), and if so, when?

    A fellow volunteer from the COVID Tracking Project and I are drafting a strongly worded email to HHS’s press team including these questions and many more; I hope to have some answers for you by next week. In the meantime, you can read Stacker’s story on hospital capacity by state, which does not cite the new HHS figures. Don’t ask me how many times I had to update the story’s methodology.