County-level testing data from an unexpected source

On September 3, 2020, the Center for Medicare & Medicaid Services (CMS) posted a county-level testing dataset. The dataset specifically provides test positivity rates for every U.S. county, for the week of August 27 to September 2.

This is huge. It’s, like, I had to lie down after I saw it, huge. No federal health agency has posted county-level testing data since the pandemic started. Before September 3, if a journalist wanted to analyze testing data at any level more local than states, they would need to aggregate values from state and county public health departments and standardize them as best they could. The New York Times did just that for a dashboard on school reopening, as I discussed in a previous issue, but even the NYT’s data team was not able to find county-level values in some states. Now, with this new release, researchers and reporters can easily compare rates across the county and identify hotspot areas which need more testing support.

So Betsy, you might ask, why are you reporting on this new dataset now? It’s been over a week since the county-level data were published. Well, as is common with federal COVID-19 data releases, this dataset was so poorly publicized that almost nobody noticed it.

It didn’t merit a press release from CMS or the Department of Health and Human Services (HHS), and doesn’t even have its own data page: the dataset is posted towards the middle of this CMS page on COVID-19 in nursing homes:

Highlighting mine.

The dataset’s release was, instead, brought to my attention thanks to a tweet by investigative reporter Liz Essley Whyte of the Center for Public Integrity:

In today’s issue, I’ll share my analysis of these data and answer, to the best of my ability, a couple of the questions that have come up about the dataset for me and my colleagues in the past few days.

Analyzing the data

Last week, I put together two Stacker stories based on these data. The first includes two county-level Tableau visualizations; these dashboards allow you to scroll into the region or state of your choice and see county test positivity rates, how those county rates compare to overall state positivity rates (calculated based on COVID Tracking Project data for the same time period, August 27 to September 2), and recent case and death counts in each county, sourced from the New York Times’ COVID-19 data repository. You can also explore the dashboards directly here.

The second story takes a more traditional Stacker format: it organizes county test positivity rates by state, providing information on the five counties with the highest positivity rates in each. The story also includes overall state testing, case, and outcomes data from the COVId Tracking Project.

As a reminder, a test positivity rate refers to the percent of COVID-19 tests for a given population which have returned a positive result over a specific period of time. Here’s how I explained the metric for Stacker:

These positivity rates are typically reported for a short period of time, either one day or one week, and are used to reflect a region’s testing capacity over time. If a region has a higher positivity rate, that likely means either many people there have COVID-19, the region does not have enough testing available to accurately measure its outbreak, or both. If a region has a lower positivity rate, on the other hand, that likely means a large share of the population has access to testing, and the region is diagnosing a more accurate share of its infected residents.

Test positivity rates are often used as a key indicator of how well a particular region is controlling its COVID-19 outbreak. The World Health Organization (WHO) recommends a test positivity rate of 5% or lower. This figure, and a more lenient benchmark of 10%, have been adopted by school districts looking to reopen and states looking to restrict out-of-state visitors as a key threshold that must be met.

Which counties are faring the worst, according to this benchmark? Let’s take a look:

This screenshot includes the 33 U.S. counties with the highest positivity rates. I picked the top 33 to highlight here because their rates are over 30%—six times the WHO’s recommended rate. The overall average positivity rate across the U.S. is 7.7%, but some of these extremely high-rate counties are likely driving up that average. Note that two counties, one in South Dakota and one in Virginia, have positivity rates of almost 90%.

Overall, 1,259 counties are in what CMS refers to as the “Green” zone: their positivity rates are under 5%, or they have conducted fewer than 10 tests in the seven-day period represented by this dataset. 874 counties are in the “Yellow” zone, with positivity rates between 5% and 10%. 991 counties are in the “Red” zone, with positivity rates over 10%. South Carolina, Alabama, and Missouri have the highest shares of counties in the red, with 93.5%, 61.2%, and 50.4%, respectively:

Meanwhile, eight states and the District of Columbia, largely in the northeast, have all of their counties in the green:

My Tableau visualizations of these data also include an interactive table, which you can use to examine the values for a particular state. The dashboards are set up so that any viewers can easily download the underlying data, and I am, as always, happy to share my cleaned dataset and/or answer questions from any reporters who would like to use these data in their own stories. The visualizations and methodology are also open for syndication through Stacker’s RSS feed—I can share more details on this if anyone is interested.

Answering questions about the data

Why is the CMS publishing this dataset? Why not the CDC or HHS overall?

These test positivity rates were published as a reference for nursing home administrators, who are required to test their staff regularly based on the prevalence of COVID-19 in a facility’s area. A new guidance for nursing homes dated August 26 explains the minimum testing requirement: nursing homes in green counties must test all staff at least once a month, those in yellow counties must test at least once a week, and those in red counties must test at least twice a week.

It is important to note that facilities are only required to test staff, not residents. In fact, the guidance states that “routine testing of asymptomatic residents is not recommended,” though administrators may consider testing those residents who leave their facilities often.

Where did the data come from?

The CMS website does not clearly state a source for these data. Digging into the downloadable spreadsheet itself, however, reveals that the testing source is a “unified testing data set,” which is clarified in the sheet’s Documentation field as data reported by both state health departments and HHS:

COVID-19 Electronic Lab Reporting (CELR) state health department-reported data are used to describe county-level viral COVID-19 laboratory test (RT-PCR) result totals when information is available on patients’ county of residence or healthcare providers’ practice location. HHS Protect laboratory data (provided directly to Federal Government from public health labs, hospital labs, and commercial labs) are used otherwise.

What are the units?

As I discussed at length in last week’s newsletter, no testing data can be appropriately contextualized without knowing the underlying test type and units. This dataset reports positivity rates for PCR tests, in units of specimens (or, as the documentation calls them, “tests performed.”) HHS’s public PCR testing dataset similarly reports in units of specimens.

How are tests assigned to a county?

As is typical for federal datasets, not every field is exactly what it claims to be. The dataset’s documentation elaborates that test results may be assigned to the county where a. a patient lives, b. the patient’s healthcare provider facility is located, c. the provider that ordered the test is located, or d. the lab that performed the test is located. Most likely, the patient’s address is used preferentially, with these other options used in absence of such information. But the disparate possiblities lead me to recommend proceeding with caution in using this dataset for geographical comparisons—I would expect the positivity rates reported here to differ from the county-level positivity rates reported by a state or county health department, which might have a different documentation procedure.

How often will this dataset be updated?

Neither the CMS page nor the dataset’s documentation itself indicate an update schedule. A report from the American Health Care Association suggests that the file will be updated on the first and third Mondays of each month—so, maybe it will be updated on the 21st, or maybe it will be updated tomorrow. Or maybe it won’t be updated until October. I will simply have to keep checking the spreadsheet and see what happens.

Why won’t the dataset be updated every week, when nursing homes in yellow- and red-level counties are expected to test their staff at least once a week? Why is more public information about an update schedule not readily available? These are important questions which I cannot yet answer.

Why wasn’t this dataset publicized?

I really wish I could concretely answer this one. I tried submitting press requests and calling the CMS’ press line this past week; their mailbox, when I called on Friday, was full.

But here’s my best guess: this dataset is intended as a tool for nursing home facilities. In that intention, it serves a very practical purpose, letting administrators know how often they should test their staff. If CMS or HHS put out a major press release, and if an article was published in POLITICO or the Wall Street Journal, the public scrutiny and politically-driven conspiracy theorists which hounded HHS during the hospitalization data switch would return in full force. Nursing home administrators and staff have more pressing issues to worry about than becoming part of a national political story—namely, testing all of their staff and residents for the novel coronavirus.

Still, even for the sake of nursing homes, more information about this dataset is necessary to hold accountable both facilities and the federal agency that oversees them. How were nursing home administrators, the intended users of this dataset, notified of its existence? Will the CMS put out further notices to facilities when the data are updated? Is the CMS or HHS standing by to answer questions from nursing home staff about how to interpret testing data and set up a plan for regular screening tests?

For full accountability, it is important for journalists like myself to be able to access not only data, but also the methods and processes around its collection and use.

One thought on “County-level testing data from an unexpected source

Leave a Reply