A few weeks ago, one of my coworkers at Stacker asked me: how many people in the U.S. have been tested for COVID-19?
This should be a simple question. We should have a national dataset, run by a national public health department, which tracks testing in a standardized manner and makes regular reports to the public. The Department of Health and Human Services (HHS) does run a national testing dataset, but this dataset only includes diagnostic, polymerase chain reaction (PCR) test results, is not deduplicated—a concept I’ll go into more later—and is not widely publicized or cited.
Meanwhile, 50 state public health departments report their local testing results in 50 different ways. Different departments have different practices for collecting and cleaning their test results, and beyond that, they report these results using different units, or the definitive magnitudes used to describe values.
You might remember how, in a high school science class, you’d get a point off your quiz for putting “feet” instead of “meters” next to an answer. Trying to keep track of units for COVID-19 data in the U.S. is like that, except every student in the class of 50 is putting down a slightly different unit, no teacher is grading the answers, and there’s a mob of angry observers right outside the classroom shouting about conspiracy theories.
Naturally, the COVID Tracking Project is keeping track anyway. In this issue, I’ll cite the Project’s work to explain the three major units that states are using to report their test results, including the benefits and drawbacks of each.
Much of this information is drawn from a COVID Tracking Project blog post by Data Quality Lead Kara Schechtman, published on August 13. I highly recommend reading the full post and checking out this testing info page if you want more technical details on testing units.
(Disclaimer: Although I volunteer for the COVID Tracking Project and have contributed to data quality work, this newsletter reflects only my own reporting and explanations based on public Project blog posts and documentation. I am not communicating on behalf of the Project in any way.)
Specimens versus people
Last spring, when the COVID Tracking Project’s data quality work started, state testing units fell into two main categories: specimens and people.
When a state reports its tests in specimens, their count describes the number of vials of human material, taken from a nose swab or saliva test, which are sent off to a lab and tested for the novel coronavirus. Counts in this unit reflect pure testing capacity: knowing the number of specimens tested can tell researchers and public health officials how many testing supplies and personnel are available. “Specimens tested” counts may thus be more precise on a day-to-day basis, which I would consider more useful for calculating a jurisdiction’s test positivity rate, that “positive tests divided by total tests” value which has become a crucial factor in determining where interstate travelers can go and which schools can reopen.
But “specimens tested” counts are difficult to translate into numbers of people. A person who got tested five times would be included in their state’s “specimens tested” count each time—and may even be included six, seven, or more times, as multiple specimens may be collected from the same person during one round of testing. For example, the nurse at CityMD might swab both sides of your nose. Including these double specimens as unique counts may artificially inflate a state’s testing numbers.
When a state reports its tests in people, on the other hand, their count describes the number of unique human beings who have been tested in that state. This type of count is useful for measuring demographic metrics, such as what share of the state’s population has been tested. In most cases, when states report population breakdowns of their testing counts, they do so in units of people; this is true for at least four of the six states which report testing by race and ethnicity, for example.
Reporting tests in units of people requires public health departments to do a process called deduplication: taking duplicate results out of the dataset. If a teacher in Wisconsin (one of the “people tested” states) got tested once back in April, once in June, and once this past week, the official compiling test results would delete those second two testing instances, and the state’s dataset would count that teacher only once.
The problem with such a reporting method is that, as tests become more widely available and many states ramp up their surveillance testing to prepare for school reopening, we want to know how many people are being tested now. As recent COVID Tracking Project weekly updates have noted, testing seems to be plateauing across the country. But in the states which report “people tested” rather than “specimens tested,” it is difficult to say whether fewer tests are actually taking place or the same people are getting tested multiple times, leading them to not be counted in recent weeks’ testing numbers.
So, COVID-19 testing counts need to reflect the numbers of people tested, to provide an accurate picture of who has access to testing and avoid double-counting when two specimens are taken from one person. But these counts also need to reflect test capacity over time, by allowing for accurate test positivity calculations to be made on a daily or weekly basis.
To solve this problem, the COVID Tracking Project is suggesting that states use a new unit: test encounters. The Project defines this unit as the number of people tested per day. As Kara Schechtman’s blog post explains, though this term may be new, it’s actually rather intuitive:
Although the phrase “testing encounters” is unfamiliar, its definition just describes the way we talk about how many times people have been “tested for COVID-19” in everyday life. If an individual had been tested once a week for a month, she would likely say she had been tested four times, even if she had been swabbed seven times (counted as seven tests if we count in specimens), and even though she is just one person (counted as one test if we count in unique people). In this case, that commonsense understanding is also best for the data.
To arrive at a “testing encounters” count, state public health departments would need to deduplicate multiple specimens from the same person, but only if those multiple specimens were taken on the same day. “Testing encounters” counts over time would accurately reflect a state’s testing capacity, without any artificial inflation of numbers. And, as a bonus, such counts would align with public understanding of what it’s like to get tested for COVID-19—making them easier for journalists like myself to explain to our readers.
What is your state doing?
The COVID Tracking Project currently reports total test encounters for five states—Colorado, Rhode Island, Virginia, New York, and Washington—along with the District of Columbia. Other states may report similar metrics, but have not yet been verified to match the Project’s definition.
You can find up-to-date information about which units are reported for each state on a new website page conveniently titled, “How We Report Total Tests.” The page notes that the Project prioritizes testing capacity in choosing which state counts to foreground in its public dataset:
Where we must choose a unit for total tests reporting, we are prioritizing units of test encounters and specimens above people—a change which we believe will provide the most useful measure of each jurisdiction’s testing capacity.
Also, if you’ve visited the COVID Tracking Project’s website recently, you might have noticed that the state data pages have seen a bit of a redesign, in order to make it clear exactly which units each state is using. Each state’s data presentation now includes all three units, with easy-to-click definition popups for each one:
I recommend checking out your state’s page to see which units your public health department is using for COVID-19 tests, as well as any notes on major reporting changes (outlined below the state’s data boxes). You can read more about the site redesign here.
When my coworker asked me how many people in the U.S. have been tested for COVID-19, I wasn’t able to give him a precise answer. The lack of standards around testing units and deduplication methods, as well as the federal government’s failure to be a leader in this work, have made it difficult to comprehensively report on testing in America. But if people—and I mean readers like you, not just data nerds like me—make testing units part of their regular COVID-19 conversations, we can help raise awareness on this issue. We can push our local public health departments to standardize with each other, or at least get better about telling us exactly what they’re doing to give us the numbers they put up on dashboards every day.