Tag: modeling

  • Unpacking U.S. data gaps and lack of public health action with Jason Salemi

    Unpacking U.S. data gaps and lack of public health action with Jason Salemi

    The CDC’s Community Level guidance contributes to current inaction on COVID-19 in the U.S. Image by Jason Salemi, from his June 10 Twitter thread.

    In April, the CDC launched a new center called the Center for Forecasting and Outbreak Analytics (or CFA). The new center aims to develop models of COVID-19 and other infectious diseases, while also helping public health agencies and individual Americans act on the information. One of CFA’s lead scientists compared it to the National Weather Service.

    But the problem is—as I discussed in a new story for FiveThirtyEightthe CFA currently does not have the data it needs to accomplish its goals. Among the challenges this new center is facing:

    • COVID-19 case data are becoming increasingly unreliable as PCR testing is less accessible and more people use at-home tests;
    • Hospitalization data are more reliable, but lag behind actual infections and may soon be unavailable in their current, comprehensive format;
    • Wastewater surveillance and other promising sources are not yet ready to replace clinical datasets;
    • A slow hiring process, as the center aims to bring on 100 scientists and communicators;
    • The CDC’s limited authority over state and local health agencies, and over the public.

    At the COVID-19 Data Dispatch today, I’m sharing one of the interviews I did for the FiveThirtyEight story. I talked to Jason Salemi, an epidemiologist at the University of South Florida College of Public Health, whom you may know from his excellent dashboard and Twitter threads providing detailed COVID-19 updates.

    While Salemi isn’t focused specifically on forecasting, he has a lot of insight about interpreting COVID-19 data and using the data for public health decisions. And I think he shares my frustration about the lack of safety measures that are being implemented across the U.S. at this dangerous point in the pandemic.

    For context, this interview took place about one month ago, while BA.2/BA.2.12.1 were driving a surge in the Northeast but hadn’t quite hit other parts of the country yet. This interview has been lightly edited and condensed for clarity.


    Betsy Ladyzhets: I wanted to start by asking, what do you see as the current state of trying to keep track of COVID in the United States? Like, what are some of the metrics that you’re looking at right now? What are some challenges that you’re facing as we deal with case numbers becoming less reliable?

    Jason Salemi: Definitely the case numbers issue. Throughout the entire pandemic, we all know that the case numbers that we learn about when somebody actually tests positive and that information gets recorded somewhere, reported to a State Department of Health and ultimately to the CDC, that’s always underestimated the true number of infections that’s been circulating in the population. Obviously, very early in the pandemic, that was really, really bad—we were mostly picking up people as they were getting sick and landing in the hospital. But as testing expanded, obviously, we did a much better job of being able to gauge what was happening with true infections by relying on the reported case numbers.

    However, during Omicron, and especially with the increased use of at-home testing—a lot of those at-home tests, if the person tested positive, were not making their way into a system that would actually get translated into the officially reported numbers. And negative at-home test results—those are definitely not making their way to public health agencies. I think in some jurisdictions, you were finding that 30%, 40% of all testing was actually antigen testing, and a significant portion of those were the at-home antigen test. 

    More recently, I think the official numbers that we hear about on a daily basis in terms of official COVID-19 cases, is becoming an increased undercount of the true number of infections that are circulating. Which is pretty striking, considering how much we’ve seen the numbers go up in the past few weeks. So, relying on officially reported cases does mean a lot less. But I still do believe that if you’re looking at—not necessarily where the numbers are exactly, but the trends in the numbers, how those numbers are changing over time—you can at least get a good feel for whether or not things are getting better or worse, even by using the COVID-19 case numbers. 

    Now, when you supplement that with things like wastewater numbers, data that are not biased by people taking advantage of testing or how they test, the wastewater numbers are maybe a better gauge for truer trends in the amount of viral spread. But again, even with wastewater numbers, two big things about those: number one is, it’s certainly not available, at least not that I can tell, for a lot of jurisdictions throughout the United States… It’s not available consistently across the country. 

    And number two, there’s nothing in those wastewater numbers where you actually can gauge: okay, this is the actual level of infection. What it helps us to do is, it’s a leading-edge indicator, where early on, we can say, “Oh, wow, we see an increase, a pretty pronounced increase in a particular area over time.” And hopefully, if we were doing things proactively, we could use those data to then implement some sort of concerted mitigation. So, this issue has become more of a challenge. But in many communities, we still can rely on how case numbers are changing over time to loosely gauge transmission rates. 

    Then, of course, a lot of people say, “It’s all about the hospitalization data, let’s utilize that.” Although I’d always love more metrics included in that [hospitalization] data set, it is something that, for some time now, we actually have consistently measured, at the national level, for every single state. You can get down to the hospital level, in some cases, and even by age group. We can have a decent understanding of how many people are being hospitalized with COVID 19. The nice thing about that is the consistency, and the fact that this [hospitalization dataset] is available everywhere, and we therefore have a decent resource that is capable of picking up indicators of more severe illness.

    But there are a lot of problems with the hospitalization data: namely, it’s a lagging indicator. Ultimately, if we were to rely exclusively on COVID-19 hospitalization rates and hospital capacity issues—those indicators lag new infections, often by five to seven days, at least. So, by the time we see those particular metrics rise, we will have lost valuable time to prevent morbidity and mortality. That’s the big [problem].

    The other thing is, there’s a lot of legitimacy to when people say, “Well, if a person went to the hospital for a non-COVID-19 related issue, and they just happened to test positive, they may not have been hospitalized because of COVID-19.” I think most are using the term “incidental.” Again, the numbers are not perfect. And when community transmission is as pronounced as it has been during many phases of Omicron, I think we do have a lot of situations where a lot of people are not being hospitalized because of COVID-19. But they are testing positive.

    For example, Jackson Health System in Florida was Tweeting out every day during the Omicron phase. And they would say—giving hypothetical numbers here—“We’ve got 250 people who are hospitalized, and that are positive for COVID-19. Of those 250 people, 51% were hospitalized for non-COVID-related reasons.” Some areas would give you more specifics, they would also break down by vaccinated versus unvaccinated. You get a lot of more rich, detailed data from some areas, but obviously, that’s not consistent across the country. In fact, I think it’s pretty rare.

    BL: Yeah, that point about hospitalizations being a lagging indicator is definitely something I want to highlight the story. And it seems very complicated, because I have heard from a couple of the modeling experts I’ve talked to that if you look at something like hospital admissions, specifically, that is less lagging. But still, overall, if you think about, like you were saying, trying to prevent more people from getting sick—even by the time you just see more hospital admissions, that’s still bad. You’ve still lost your chance to put in new mask measures, or whatever the case may be.

    JS: Oh, absolutely. And, you know, if we really were in a state right now, where getting infected really did no damage to people, it never caused any severe illness, we would obviously care less about transmission levels. Although you could always use the argument that the more we let COVID-19 circulate, the more likely it is that new variants will emerge with potentially more dangerous characteristics. So, even if it wasn’t causing a lot of severe illness, you’ve always got that aspect of it.

    But we are certainly not yet at a stage in which we can say [getting infected does no damage]—even though for the average individual Omicron is less severe when we compare it to something like Delta. But we paid a steep price in many areas in the United States to get the infection-acquired immunity and vaccine-acquired immunity that seems to be blunting the effects of Omicron. Right now, that’s why we’re not seeing the rise in hospitalization rates as steep as the rise in case rates. 

    But we are still seeing people getting hospitalized, an increasing number of people over the past couple of months. We’re not yet in a position where COVID-19 is not causing any damage. And we’re largely ignoring things like Long COVID. Just because somebody doesn’t get hospitalized, that doesn’t mean that [the virus is] still not causing a decrease in the quality of life for many people, and a decreased quality of life that can linger for some time.

    BL: Yeah, definitely. And then, another issue with hospitalization data that I wanted to ask you about, because I know you’ve looked at this, is the fact that if you’re using these county risk levels, or community levels, whatever the CDC is calling them—not every county has a hospital. So really, this is data at a somewhat larger regional level. I’m wondering if you could explain why this is an important distinction.

    JS: Yeah. And you know, this is not at all a criticism, this is kind-of the nature of the beast, so to speak. There are a lot of communities where—I’ll use Florida, because I’m most familiar with Florida, as an example. We have got a major health care system in Alachua County, which is really not a big county in Florida, not even in the top 20 largest counties. But it is a major area where a lot of people from surrounding smaller counties, like a nine- or ten-county catchment area, if they were to get really sick, that’s where they’re most likely going for treatment. And so, if you have a metric that is based on hospitalization rates, and you don’t have a hospital, obviously, you can no longer really provide a county-level indicator. It has to be more regional. And so you see a lot of variation in how the CDC has to now go from the county level to what they call health services areas.

    These [health service areas] are established groupings. In these regions, the overwhelming majority of people in these locations are going to a hospital in the broader health service area. And so it’s confusing, I think, to people: with this newer CDC metric, they wonder, “How is it that there’s no hospital in my county or the county next to me, yet you’re giving me a county-level risk measure that is supposed to be based primarily on hospitalization data?”

    And again, I think, some of the nuances of the metric get lost on people… Hospitalization data comes from a broader region [than cases], and there’s a lot of variation. There are some counties that are standalone, like Manatee County in Florida, so there is no health service area, it’s just one county for all measures. But there are some others where more than 15 counties that feed into that health service area. So again, for some people in some communities, I can understand where it’s just confusing and frustrating as to, “What does this risk level really mean, for me and the people that live near me, since the catchment region is so much larger?” This is not a right or a wrong, I understand why CDC does it the way that they do it if they’re trying to get a hospitalization-based measure. But it’s just challenging for people to digest.

    BL: Yeah, it’s challenging on that communications front. With the previous transmission levels, you could just kind-of look at the case rate and the positivity rate and be like, “Okay, I get where this is coming from.” But yeah, now it’s a little trickier. Another thing on this topic: I saw a report from POLITICO this morning that is suggesting, basically, if the National Public Health Emergency gets ended this summer, then the CDC might lose its ability to require states or hospitals to actually report the hospitalization data that is basically our best source right now. So, what would the implications be if that happens in a few months?

    (Editor’s note: After this interview, the Biden administration extended the public health emergency beyond July 15. But it’s unclear how many more times the emergency will be extended.)

    JS: I’d say pretty significant implications. Look, I’ve tried to give credit where credit is due, like the gains made with improving the federal hospitalization data. I’ve also been a critic when I feel as though we are missing key data sources or data elements. An example is the hospitalization data not having race and ethnicity information, I feel like that’s a big component that would be meaningful.

    But even with its limitations, the hospitalization data have been a very, very, very important tool for us to be able to report what’s happening in communities. And obviously, nobody wants to fly blind as it pertains to the pandemic. So if we don’t have uniform reporting from all of these states and jurisdictions, then we have to rely on the willingness of leaders at each state or community level to make similar information available, and to report that information in a timely and consistent manner. 

    Right now, we are fortunate that we continue to get the hospitalization data updated on a daily basis. And so yeah, that would obviously be a big loss if it were—it’s one thing to not have it required. But if states chose not to report that information, which certainly some states would choose not to… it would be a big loss, depending on what states choose to do to keep the population informed.

    Because, to be honest, when we get this national data, it’s a gut reaction that we want to compare states on everything—on death rates, on case rates, on hospitalization rates. To me, this can be a huge mistake. One of the obvious reasons that everybody talks about is age differences, right? Some states have a much higher percentage of older people. But it’s not just age that makes state comparisons difficult. It’s weather, and racial and ethnic distribution, and the job industries in which people can work, population density. So, I don’t really care too much about national-level data being used primarily to make state comparisons and inferences that can be misguided.

    But to have consistently reported information across the country, again, is important for us to be able to make more responsible decisions even at the local level. I would hope if that happens [losing the national dataset], we would still have states and cities and counties and communities and all these different geospatial areas continuing to report, collect, and make available to the public meaningful information in a timely manner so we can make responsible decisions.

    BL: Yeah, that makes sense. And I know that question of authority and like, what can and can’t you require the states to do, is a large issue for the CDC. I was able to talk to Mark Lipsitch yesterday, he’s one of the scientists who’s working on this new forecasting center. And one challenge he mentioned to me is that the CDC really doesn’t have the authority that it would like to in terms of requiring data reporting. They can’t require every state to start doing wastewater surveillance, they can’t require every state to report vaccine effectiveness data or breakthrough cases. And to me, that just seems like a massive hurdle that they face in trying to do this kind of long-term improvement of infectious disease forecasting.

    JS: Yeah—and it’s not just the ability, it’s also having the will. I’ve collaborated with some truly amazing scientists from the CDC for a very long time on a myriad of different initiatives, and I have little doubt that they will compile a team of experts that can analyze meaningful metrics to generate what I imagine will be a wealth of data on where we’re going in the pandemic. But it’s not just about analytic proficiency. I did read on their [CFA’s] site that their stated goal is to enable timely, effective decision-making to improve outbreak response. But how are we going to utilize those data to make recommendations? What outcomes are they going to emphasize? What communities are we thinking about when we make those recommendations?

    A lot of people talk about the measures we use, and which ones are best, and how we collect the data, and the validity, and the sophistication of the approaches that we use to either nowcast or forecast into the future. But to me, it’s also the way in which we operationalize those measures for public health recommendations. That’s where a lot of the talk is now about the measures being utilized by CDC. So whether it’s their four-level community transmission measure, or that newer three-level measure that’s based mainly on hospitalization data—how we’re using that to make recommendations, it says something about what the agencies who establish those boundaries are willing to accept.

    For example, I was just looking at some data again, when I did that thread this morning. The highest level on the community transmission metric, that used to indicate 100 cases or more per 100,000 people over the most recent seven-day window. Right now, based on the data that I just ran, we’ve got 105 counties in the United States with a population of at least 250,000—not just small counties, but large ones—that have a low community level [the CDC’s more recent metrics], the lowest possible, but they have a transmission level that is higher than that 100 per 100,000 threshold.

    And more importantly, we’ve got 28 counties—again, with a population of 250,000 or more—that are classified as medium level. That is a level with no recommendations for mask-wearing in public indoor settings. And those 28 counties have a case rate that is more than triple the threshold for high transmission, that’s 300 per 100,000, over the past seven days. You expect that medium level to change to high in the not-to-distant future for many of these areas.

    So again, it’s one thing to collect the metrics and have skilled analysis. But what we do with those measures and that analysis, is just as meaningful. And what does it mean, if we have an area that has really pronounced transmission—and we know in the past, that pronounced transmission means that the virus is going to be exceedingly good at finding vulnerable populations—and we’re not having any meaningful population-based recommendations… 

    When I looked, some of these counties were at like 400 per 100,000 [cases in a week], four times the threshold for the high transmission level [under the old CDC guidance], and they’re still not at a level where we’re supporting or recommending mask-wearing in public indoor settings. That’s pretty shocking. And I think that’s why anecdotally, now, even in my area, I’m just hearing about more and more people daily, that are not able to come to work. A lot of people are getting infected. And you’re seeing that in the rising numbers.

    BL: Absolutely. I mean, isn’t the threshold for moving from low to medium under the new community levels 200 new cases per 100,000 [per week, regardless of hospitalization numbers]?

    JS: Yeah, right. So even if you had no rise in hospitalizations, you can have a progression to the medium level. But that is now twice what the highest transmission threshold used to be. And again, I’m looking at counties that are in that medium level that now have almost twice even that newer threshold.

    We’re not yet in a situation where COVID is not causing any severe illness whatsoever. We’re ignoring a lot of the ramifications of Long COVID, we’re ignoring the fact that, when community spread has gotten so pronounced, you tend to have the virus easily, efficiently finding the most vulnerable people in those communities and still inflicting damage.

    I just feel like we’re missing an opportunity. We’re not talking about shutdowns, we’re talking about simple measures that we can put in place and recommend to people to try and balance having, normal living with putting reasonable but important precautions in place. Because that ultimately will prevent a lot of morbidity and mortality. And I feel like that’s maybe the big missed opportunity right now.

    So, I’d be excited to see a new forecasting center come out of the CDC. They are very adept scientists. But it’s ultimately, what do we do? What do we do with the data that emerges out of that center? And what recommendations, simple recommendations, do we end up giving to the public based on those analyses?

    BL: I totally agree. One of the new center’s focuses is that they want to hire a bunch of science communicators to think about these things. But still, I guess I’m a little skeptical about how much they’re gonna really be able to have an impact here, when we’re already at such a polarized position in the pandemic.

    JS: Yeah, it’s not that any of this is easy. No matter what you do, you’re going to upset a whole lot of people nowadays. I speak strictly from a scientist’s perspective. And I really do get all sides of this equation, like the businesses and the very real toll that the pandemic has taken on people. And so it is, no matter what you do, there is a balance that you have to achieve.

    But when I start to see—again, I’m going more from what has transpired specifically in Florida. And a lot of the talk this time last year, after we had the availability of vaccines, things were looking great for Florida. Numbers were really low. And that was pretty much throughout the United States, we had the vaccines, though we still heard a lot about protecting the most vulnerable, the oldest in our communities. And even as the cases started to rise, during Delta, it was like, well, just protect the vulnerable.

    But again, when community transmission gets that pronounced, the virus will continue to find the most vulnerable. And it ended up inflicting by far the largest death toll in Florida than we’ve had at any point in the pandemic, after vaccines were available for a long period of time. And that included a significant percentage of people who were not seniors. So, it’s tough, but still, people’s livelihood and lives are on the line when we’re talking about COVID.

    More federal data

  • CDC launches new pandemic forecasting center

    CDC launches new pandemic forecasting center

    The CDC’s new Center for Forecasting and Outbreak Analytics (CFA) intends to modernize the country’s ability to predict disease outbreaks. Image via the CDC.

    This week, the CDC introduced a new team focused on modeling infectious diseases, called the Center for Forecasting and Outbreak Analytics (or CFA). The agency aims to hire about 100 scientists and communicators for the center; they’ll currently focus on COVID-19, but will expand to other diseases in the future.

    “We think of ourselves like the National Weather Service, but for infectious diseases,” Caitlin Rivers, the new center’s associate director for science, told the Washington Post.

    This idea of forecasting infectious diseases like the weather was a major theme of an event that the White House hosted last Tuesday, timed with the introduction of the CDC’s new center. This event, a three-hour summit, featured speeches from the administration’s COVID-19 response leaders (Dr. Ashish Jha, Dr. Rochelle Walensky, etc.), as well as panels bringing together the scientists who have joined CFA so far, healthcare leaders, and public health workers from around the country.

    I watched the event on a livestream, and kept a running Twitter thread of key points:

    As discussed at the summit and on CFA’s new website, this center has three main functions:

    • Predict: A team of disease modelers, epidemiologists, and data scientists will establish methods for forecasting disease spread and severity, in collaboration with state and local leaders.
    • Inform: A team of science communicators will share information from the Predict team’s modeling efforts with public health officials and with the public, ensuring that this information is actionable.
    • Innovate: In addition to its in-house analysis and communication, CFA will fund research and development to drive better data collection and forecasting strategies.

    According to the CDC, CFA has already awarded $26 million in funding to “academic institutions and federal partners” working on forecasting methods, as part of this “innovate” priority. Neither CFA’s website nor the summit provided any indication of what these institutions are or what they’re working on; I wrote to the CDC’s media team asking for more information, and have yet to hear back from them.

    At last Tuesday’s summit, it was nice to hear health officials from the local to the federal levels describe COVID-19 data issues that I’ve been writing about for two years. These included: the need for more timely data on issues like new variants and vaccine effectiveness; the need for more demographic data that can inform health equity priorities; the need for more coordination (and standardization) between different state and local health agencies; and the need for actionable data that are communicated in a way people outside science and health settings can understand.

    But for all this discussion of the problems with America’s current health data systems, the event included very little indication of potential solutions. For instance, as Bloomberg health editor Drew Armstrong pointed out, nobody mentioned that many of our problems would be solved with a national healthcare system, following the lead of the U.K.—whose data we’ve relied on throughout the pandemic.

    Moreover, Tuesday’s event was very rushed: each panel was just half an hour long, with only a few minutes for each expert panelist to make their points and barely any time for questions. I would’ve loved to hear entire keynote speeches from people like Dr. Anne Zink, director of Alaska’s public health agency, and Dr. Loretta Christensen, chief medical officer for the Indian Health Service. But they were relegated to brief comments.

    It almost felt like the Biden administration had taken a couple of hours in their schedule to appease the science and health experts who wanted to see some acknowledgment of the COVID-19 data issues—and then went right back to downplaying the pandemic. (Also not lost on me: this same day, administration officials were “weighing the political risks” of appealing the blocked travel mask mandate.)

    I would love to be proven wrong, and to see this new CDC center usher in an era of standardized, actionable infectious disease data and modeling across the country. But right now, I’m not very optimistic.

    More federal data

  • Five more things, May 9

    I couldn’t decide which of these news items to focus on for a short post this week, so I wrote blurbs for all five. This title and format are inspired by Rob Meyer’s Weekly Planet newsletter.

    1. HHS added vaccinations to its facility-level hospitalization dataset: Last week, I discussed the HHS’s addition of COVID-19 patient admissions by age to its state-level hospitalization dataset. This week, the HHS followed that up with new fields in its facility-level dataset, reflecting vaccinations among hospital staff and patients. You can find the dataset here and read more about the new fields in the FAQ here (starting on page 14). It’s crucial to note that these are optional fields, meaning hospitals can submit their other COVID-19 numbers without any vaccination reporting. Only about 3,200 of the total 5,000 facilities in the HHS dataset have opted in—so don’t sum these numbers to draw conclusions about your state or county. Still, this is the most detailed occupational data I’ve seen for the U.S. thus far.
    2. A new IHME analysis suggests the global COVID-19 death toll may be double reported counts: 3.3 million people have died from COVID-19 worldwide as of May 8, according to the World Health Organization. But a new modeling study from the University of Washington’s Institute for Health Metrics and Evaluation (IHME) suggests that the actual death number is 6.9 million. Under-testing and overburdened healthcare systems may contribute to reporting systems missing COVID-19 deaths, though the reasons—and the undercount’s magnitude—are different in each country. In the U.S., IHME estimates about 900,000 deaths, while the CDC counts 562,000. Read STAT’s Helen Branswell for more context on this study.
    3. The NYT published a dangerous misrepresentation of vaccine hesitancy (then quietly corrected it): A New York Times story on herd immunity garnered a lot of attention (and Twitter debate) earlier this week. One specific aspect of the story stuck out to some COVID-19 data experts, though: a U.S. map entitled, “Uneven Willingness to Get Vaccinated Could Affect Herd Immunity.” The map, based on HHS estimates, claims to display vaccine confidence at the county level. But the estimates are really more reflective of state averages, and moreover, the NYT originally double-counted the people who are strongly opposed to vaccines, leading to a map that made the U.S. look much more hesitant than it actually is. Biologist Carl Bergstrom has a thread detailing the issue, including original and corrected versions of the map.
    4. We still need better demographic data: A poignant article in The Atlantic from Ibram  Kendi calls attention to gaps in COVID-19 data collection that continue to loom large, more than a year into the pandemic. The story primarily discusses race and ethnicity data, citing the COVID Racial Data Tracker (which I worked on), but Kendi also highlights other underreported populations. For example: “The only available COVID-19 data on undocumented immigrants come from Immigration and Customs Enforcement detention centers.”
    5. NIH college student trial is having a hard time recruiting: If you, like me, have been curious about how that big NIH trial to study vaccine effectiveness in college students has progressed since it was announced last March, I recommend this story from U.S. News reporter Chelsea Cirruzzo. The study aimed to recruit 12,000 students at a select number of colleges, but because the vaccine rollout has progressed faster than expected, researchers are having a hard time finding not-yet-vaccinated students to enroll. (1,000 are enrolled so far.) Now, students at all higher ed institutions can join.

  • How one biostatistics team modeled COVID-19 on campus

    How one biostatistics team modeled COVID-19 on campus

    Screenshot of a modeling dashboard Goyal worked on, aimed at showing UC San Diego students the impact of different testing procedures and safety compliance.

    When the University of California at San Diego started planning out their campus reopening strategy last spring, a research team at the school enlisted Ravi Goyal to help determine the most crucial mitigation measures. Goyal is a statistician at the policy research organization Mathematica (no, not the software system). I spoke to Goyal this week about the challenges of modeling COVID-19, the patterns he saw at UC San Diego, and how this pandemic may impact the future of infectious disease modeling.

    Several of the questions I asked Goyal were informed by my Science News feature discussing COVID-19 on campus. Last month, I published one of my interviews from that feature: a conversation with Pardis Sabeti, a computational geneticist who worked on COVID-19 mitigation strategies for higher education. If you missed that piece, you can find it here.

    In our interview, Goyal focused on the uncertainty inherent in pandemic modeling. Unlike his previous work modeling HIV outbreaks, he says, he found COVID-19 patterns incredibly difficult to predict because we have so little historical data on the virus—and what data we do have are deeply flawed. (For context on those data problems, read Rob Meyer and Alexis Madrigal in The Atlantic.)

    Paradoxically, this discussion of uncertainty made me value his work more. I’ve said before that one of the most trustworthy markers of a dataset is a public acknowledgment of the data’s flaws; similarly, one of the most trustworthy markers of a scientific expert is their ability to admit where they don’t know something.

    The interview below has been lightly edited and condensed for clarity.


    Betsy Ladyzhets: I’d love to hear how the partnership happened between the university and Mathematica, and what the background is on putting this model together, and then putting it into practice there.

    Ravi Goyal: Yeah, I can give a little bit of background on the partnership. When I did my PhD, it was actually with Victor De Gruttola [co-author on the paper]. We started using agent-based models back in 2008 to sort of understand and design studies around HIV.  And in particular in Botswana, for the Botswana Combination Prevention Project, which is a large random cluster study in Botswana.

    So we started using these kinds of [models] to understand, what’s the effect of the interventions? How big of a study has to be rolled out to answer epidemiological questions? Because, as you would imagine, HIV programs are very expensive to roll out, and you want to make sure that they answer questions.

    I’ve been working with [De Gruttola] on different kinds of HIV interventions for the last decade, plus. And he has a joint appointment at Harvard University, where I did my studies, and at the University of California in San Diego. And so when the pandemic happened, he thought some of the approaches and some of the stuff that we’ve worked on would be very applicable to helping think about how San Diego can open. He connected me with Natasha Martin, who is also on the paper and who is part of UC San Diego’s Return to Learn program, on coming up with a holistic way of operating procedures there. She’s obviously part of a larger team there, but that’s sort of where the partnership came about.

    BL. Nice. What would you say were the most important conclusions that you brought from that past HIV research into now doing COVID modeling?

    RG: Two things. One is uncertainty. There’s a lot of things that we don’t know. And it’s very hard to get that information when you’re looking at infectious diseases—in HIV, in particular, what was very difficult is getting really good data on contacts. In that setting, it’s going to be sexual contacts. And what I have understood is that people do not love revealing that information. When you do surveys where you get that [sexual contact] information, there’s a lot of biases that creep in, and there’s a lot of missing data.

    Moving that to the COVID context, that is now different. Different kinds of uncertainty. Biases may be recall biases, people don’t always know how many people they have interacted with. We don’t have a good mechanism to sort of understand, how many people do interact in a given day? What does that look like?

    And then, maybe some of these that can creep in when you’re looking at this, is that people may not be completely honest in their different risks. How well are they wearing masks? How well are they adhering to some of those distancing protocols? I think there’s some stigma to adhering or not to adhering. Those are biases that people bring in [to a survey].

    BL: Yeah, that is actually something I was going to ask you about, because I know one of the big challenges with COVID and modeling is that the underlying data are so challenging and can be very unreliable, whether that’s, you know, you’re missing asymptomatic cases or it’s matching up the dates from case numbers to test numbers, or whatever the case may be. They’re just a lot of possible pitfalls there. How did you address that in your work with the University of California?

    RG: At least with the modeling, it makes it a little more difficult in the timeframe that we were modeling and looking at opening, both for our work on K-12 and for UCSD. We kicked it off back in April, and May, thinking about opening in the fall. So, the issue there is, what does it look like in the fall? And we can’t really rely on—like, the university was shut down. There’s not data on who’s contacting who, or how many cases are happening. There were a lot of things that were just completely unknown, we’re living in a little bit of a changing landscape.

    I’m sure other people have much more nuance [on this issue], but I’m going to just broadly stroke where this COVID research was different than HIV. For HIV, people might not radically change the number of partnerships that they’re having. When we’re thinking about a study in Botswana, we can say, what did it look like in terms of incidents four years prior? And make sure we’re making our modeling represents that state of how many infections we think are happening.

    Here [with COVID], when we’re thinking about making decisions in September or October. You don’t have that, like, oh, let’s match it to historical data option because there was no historical data to pin it to. So it was pooling across a lot—getting the estimates to run to the model, getting those is, you’re taking a study from some country X, and then you’re taking another different study from country Y, and trying to get everything to work and then hopefully when things open up, you sort-of re-look at the numbers and then iteratively go, what numbers did I get wrong? Now in the setting where things are open, what did we get wrong and what do we need to tweak?

    BL: I noticed that the opening kind-of happened in stages, starting with people who were already on campus in the spring and then expanding. So, how did you adjust the model as you were going through those different progressions?

    RG: Some assumptions were incorrect in the beginning. For example, how many people were going to live off campus, that was correct. But how many people, of those off-campus people, were ever going to come to campus, was not there. A lot of people decided not to return to San Diego. They were off-campus remote, but they never entered campus. Should they have been part of that model? No. So once we had those numbers, we actually adjusted.

    Just this past week, we’ve sort of started redoing some of the simulations to look towards the next terms. Our past miscalculation or misinformation, what we thought about how many people would be on campus, now we adjusted from looking at the data. 

    And some of the things that we thought were going to be higher risk, at least originally, ended up being a little bit lower risk than anticipated. One thing is around classrooms. There have been—at least, from my understanding, there have been very few transmissions that are classroom-related. And we thought that was going to be a more of a higher transmission environment in the model, wasn’t what we saw when we actually had cases. So now we’re adjusting some of those numbers to get it right to their particular situation. It’s a bit iterative as things unroll.

    BL: Where did you find that most transmissions were happening? If it’s not in the classroom, was it community spread coming into the university?

    RG: They [the university] have a really nice dashboard, where it does give some of those numbers, and a lot of the spread is coming from the community coming on to campus, and less actual transmissions that are happening within. I think that’s where the bulk is. I think the rates on campus were lower than the outside.

    BL: Yeah, that kind-of lines up with what I’ve seen from other schools that I’ve researched that, you know, as much as you might think a college is an environment where a lot of spread’s gonna happen, it also allows for more control, as opposed to just a city where people might be coming in and out.

    Although one thing, another thing I wanted to ask you about, is this idea that colleges, when they’re doing testing or other mitigation methods, they need to be engaging with the community. Like UC Davis, there’s been some press about how they offer testing and quarantine housing for everybody. Not just people who are students and staff. I was wondering if this is something accounted for in your model, and sort of the level of community transmission or the level of community testing that might be tied to the university and how that impacts the progression of things on campus.

    RG: The model does incorporate these infections coming in for this community rate, and that was actually based off of a different model modeling group, which includes Natasha, that is forecasting for the county [around UC San Diego]. Once again, you have to think about all the biases on who gets tested. False positives, all of those kinds of caveats. They built a model around that, which fed into the agent-based modeling that we use. We do this kind-of forecasting on how many infections do we think are going to be coming in from people who live off-campus, or staff, or family—what’s their risk?

    That’s where that kind of information was. In terms of quarantining my understanding is, I don’t think they were quarantining people who weren’t associated [with the school] in the quarantine housing.

    BL: Right. Another thing I wanted to ask about, I noticed one of the results was that the frequency of testing doesn’t make a huge difference in mitigation compared to other strategies as long as you do have some frequency. But I was wondering how the test type plays in. Say, if you’re using PCR tests as opposed to antigen tests or another rapid test. How can that impact the success of the surveillance mechanism?

    RG: Yeah, we looked a little bit in degrading the sensitivity from a PCR test to antigen. The conclusion was that it’s better to more frequently test, even with a worse-performing test than it is to just do monthly on the PCR.

    We put it on the dashboard. This is the modeling dashboard… It has a couple of different purposes. So first, there was obviously when the campus was opening, a lot of particular anxiety on what may happen come September, October, and some of that [incentive behind the dashboard] was to be transparent. Like, here’s the decisions being made, and here is some of the modeling work… Everything that we know or have is available to everyone.

    And the second piece was to have a communication that safety on campus is the responsibility of everyone. That’s where the social distancing and adherence to masking comes in, why you’re allowed to change that [on the dashboard], is supposed to hopefully indicate that, you know, this really matters. Here’s where the students and faculty and staff roles are on keeping campus open. That was the two points, at least on my end, in putting together a dashboard and that kind of communication.

    BL (looking at the modeling dashboard): It’s useful that you can look at the impacts of different strategies and say, okay, if we all wear masks versus if only some of us wear masks, how does that change campus safety?

    Another question: we know that university dorms, in particular, are communal living facilities—a lot of people living together. And so I was wondering what applications this work might have for other communal living facilities, like prisons, detention centers, nursing homes. Although I know nursing homes are less of a concern now that a lot of folks are vaccinated there. But there are other places that might not have the resources to do this kind of modeling, but may still share some similarities.

    RG: Yeah, I think that’s a really interesting question. I sit here in Colorado. The state here runs some nursing homes. So we originally looked at some of those [modeling] questions, thinking about, can we model [disease spread in nursing homes]?

    I think there’s some complexities there, thinking about human behavior, which may be a little bit easier in a dorm. The dorm has a sort-of structure of having people in this suite, and then within the dorm—who resides there, who visits there, has some structure. It’s a little bit harder in terms of nursing homes, or probably it’s the same with detention centers, in that you might have faculty or staff moving across a lot of that facility, and how that movement is a constantly-evolving process. It wasn’t like a stationary state, having a structure, if that makes sense?

    BL: Yeah. Did you have success in modeling that [nursing homes]?

    RG: Not really so much with [a long-term model], it was more, we had a couple of meetings early on, providing guidance. My wife works for the state with their COVID response, so that was an informal kind-of work. They were trying to set up things and think about it, so I met with them to share some lessons learned that we have.

    BL: That makes sense. What were the main lessons? And I think that is a question, returning to your university work, as well—for my readers who have not read through your paper, what would you say the main takeaways are?

    RG: I think I would probably take away two things that are a little bit competing. One is, based on both some of the university work and the K-12 work, that we have the ability to open. We have a lot of the tools there, and some things can open up safely given that these protocols that we have in place, particularly around masking and stuff like that, can be very effective. Even in settings that I would have originally thought were very high risk. Areas that could have a very rapid spread, for example college campuses.

    Some campuses, clearly, in the news, [did have rapid spread]. But it’s possible to open safely. And I think some of the positive numbers around UC San Diego showed that. Their case counts were very manageable for us. It was possible to open up safely, and same with the K-12. That requires having a first grader wear a mask all day, and I wasn’t sure it would work! But it seems like some of that takeaway is that these mitigation strategies can work. They can work in these very areas that we would have not thought they would have been successful.

    So that’s one takeaway, that they can work. And the competing side is that there’s a lot of uncertainty. Even if you do everything right, there is a good amount of uncertainty that can happen. There’s a lot of luck of the draw, in terms of, if you’re a K-12 school, are you going to have just a couple people coming in that could cause an outbreak? That doesn’t mean that you did anything wrong. [There’s not any strategy] that’s 100% guaranteed that, if you run the course, you won’t get any outbreaks.

    BL: I did notice that the paper talks about superspreading events a little bit, and how that’s something that’s really difficult to account for.

    RG: Human behavior is the worst. It’s tough to account for, like, are there going to be off-campus parties? How do you think about that? Or is that, will the community and their communication structure going to hamper that and effectively convince people that these safety measures are there for a reason? That’s a tricky thing.

    BL: Did you see any aspect of disciplinary measures whether that is, like, students who had a party and then they had to have some consequence for that, or more of a positive affirmation thing? One thing that I saw a couple of schools I’ve looked at is, instituting a student ambassador program, where you have kids who are public health mini-experts for their communities, and they tell everyone, make sure you’re wearing your masks! and all that stuff. I was wondering if you saw anything like that and how that might have an impact.

    RG: The two things that I know about… I know there were alerts that went out, like, oh, you’re supposed to be tested every week. I don’t know about any disciplinary actions, that’s definitely out of my purview. But talking to grad students as well, I knew that if they didn’t get tested in time, they would get an alert.

    And the other thing that I will say in terms of the planning process—I got to be a fly on the wall in UC San Diego’s planning process on opening up. And what I thought was very nice, and I didn’t see this in other settings, is that they actually had a student representative there, hearing all the information, hearing the presentations. I had no idea who all of these people are on all these meetings, but I know there was a student who voiced a lot of concerns, and who everyone seemed to very much listen to and engage with. It was a good way to make sure the students aren’t getting pushed under—a representative was at the table.

    BL: Yeah, absolutely. From the student perspective, it’s easier to agree to something when you know that some kind of representative of your interest has been there, as opposed to the administrators just saying, we decided this, here’s what you need to do now.

    My last question is, if you’ve seen any significant changes for this current semester or their next one. And how vaccines play into that, if at all.

    RG: That’s the actual next set of questions that we’re looking into. If weekly testing continues, does the testing need change as people get vaccinated? The other thing that they have implemented is wastewater testing and alerts. They’re sampling all the dorms. And how does that impact individual testing, as well? Does that—can you rely on [wastewater] and do less individual testing? That’s some of the current work that we’re looking into.

    BL: That was all my questions. Is there anything else that you’d want to share about the work?

    RG: I will say, on [UC San Diego’s] end… I think you can use models for two things. You can use them to make decisions—or not make them, but help guide potential decisions. Or you can use them to backdate the decisions that you wanted to make. You can always tweak it. And I would say, in the work I’ve done, it’s been the former on the part of the school.

    The other thing is, thinking about the role of modeling in general as we move forward, because I think there’s definitely been an explosion there.

    BL: Oh, yeah.

    RG: I think it brought to light the importance of thinking about… A lot of our statistical models, for example, are very much individual-based. Like, your outcome doesn’t impact others. And I can see these ideas, coming from COVID—this idea that what happens to you impacts me, it’s going to be a powerful concept going forward.