The "Indiana" Study: A Critical Appraisal
A new observational study shows immunity from prior infection is better than vaccination in preventing reinfection, but leads to more all-cause morbidity. But, is the conclusion valid?
A recently published study in the AJPH provides a provocative finding- that natural immunity derived from prior SARS-CoV-2 infection was more protective than vaccination in preventing future infection. However, initial SARS-CoV-2 exposure leads to a higher incidence of all-cause ED visits, hospitalization and death. The authors conclude:
“The data raise questions about the wisdom of reliance on natural immunity when safe and effective vaccines are available.”
The authors do not provide what causes the additional all-cause consequences in the previously infected, but dismiss the “risk averse” hypothesis (as discussed in this much tweeted “traffic crash” paper). Instead, they suggest that initial COVID-19 infection leads to vague long-term “health consequences” (e.g. “long COVID”, but not mentioned by name), as measured by all-cause outcomes— ED utilization, hospitalization, and death.
Their findings:
The prior infected had a lower rate of (re)infection (2.9 vs.6.7%).
The prior infected had a higher rate of ED visits (37% increase, 6.7 vs 5.0%), hospitalizations (24%,1.8 vs 1.2%), and death( 37% increase, 0.51 vs.0.29%).
The general results held for each age group, although varying in magnitude.
Here, we will take a deeper dive into the methodology and appraise the validity of this conclusion. The spoiler: suboptimal matching, selection biases, base rate neglect, and attrition biases irreparably limit the intended conclusion.
Critiques
As with any study, many critiques can be made. I will highlight the most important here (we simplify “ED utilization/hospitalization/death” = “outcomes”):
Suboptimal matching: The two cohorts (previously infected and vaccinated) were “matched” according to simple demographics and the number of COVID-19 risk factors only, and not by individual factors that impact all-cause outcomes. Therefore, no guarantee can be made on the comparability of cohorts with all-cause outcomes, and allows for selection bias.
There is no substitute for a well-designed RCT, but that is not possible here. So, in observational trials, investigators attempt to mimic randomness by matching similar patients in each arm. It may work for the matching variables that are arbitrarily selected — but fundamentally neglects the multitude of other variables and hidden, unknown confounders. In this case, two cohorts were constructed from a large Indiana state database, using patients with records from 2016-2022. The authors intended to “match” the two cohorts to “emulate the structure of a clinical trial” (their words). Matching was performed on the typical demographics, zip code, date of infection/vaccination, and number of comorbidities deemed to be risk factors for COVID-19 (as determined by the CDC).
There are at least two issues with their matching process.
First, they only matched based on COVID-19 risk factors, and NOT on risk factors for all-cause outcomes. Yet, they confidently present the results for all-cause ED/hospitalization/death as an unbiased result. Yes, there may be some overlap in risk factors for both COVID-19 infection and all-cause consequences, but there are many differences as well. And, there are many non-medical risk factors of negative health consequences (income, occupation, lifestyles, etc.), which are neglected in this study. So, one cannot assume that there are not additional hidden differences in the two cohorts.
In fact, the authors use the premise that comorbidities change risk/severity of infection, to choose variables for their matching process. But, if that premise is correct, the a priori selection of the previously infected cohort would likely have the same (and more) comorbidities, leading to a classic selection bias. In addition, there were likely many healthy individuals with asymptomatic SARS-CoV-2 infection, that were excluded from the cohort — a “nonselection” bias.
Second, despite the large number of patients they had access to, they simply matched patients based only on the number of comorbidities, and not the type or severity of a comorbidity. So, for example, a subject with end-stage cancer would be treated the same as someone with well-controlled diabetes. The authors give no further breakdown on the two populations, other than the “number” of comorbidities matched sufficiently.
Thus, one cannot assume the design remotely “emulates the structure of a clinical trial” — it does not even attempt to the match the cohorts according to all the observations the authors intended to study. The two groups in this study are simply, two different groups by many characteristics — not just vaccination!All of their conclusions are critically leveraged the fidelity of this match.
Base rate neglect: If the two “matched” cohorts are not exactly comparable, then one cannot assume that the base rate incidences of all-cause outcome are the same prior to infection/vaccination. Thus, differences cannot be attributed to infection/vaccination.
What is if the two groups had different health outcomes before vaccination was available? What if it was different before the pandemic? Wouldn’t the authors need to account for a difference in baseline?
This seems like a fundamental scientific premise — that if you are going to compare the rates of outcome in two populations after a specific intervention, then you need to at least know where they are starting from. In this study, the authors simply define an arbitrary time (30 days after infection/vaccination), and start counting outcomes — with no consideration or analysis of the historic base rates of each group. As such, the differences in outcome could be attributed to pre-existing differences in the group, and not the infection/ vaccination event.
Interestingly, the authors of this study selected subjects that had electronic records since 2016 (!). So, they could have (fairly easily) compared baseline outcomes in these two groups, historically, prior to infection/vaccination, and even pre-pandemic. If they are able to show that these two groups had comparable outcomes beforehand — and the curves diverged only after the point of infection/vaccination, it would lend more credibility to their findings. However, this analysis is not performed — and so we are asked to incorrectly assume that the historic baselines of these two groups are the same.
As a result, the cumulative incidence graphs (Figure 2) are deceptive. Because they start counting from zero at the t=0 (index date), it gives the impression that both groups start from the same baseline point. But, that assumption is never validated in this study. If, for example, t=0 was selected to be 3 months prior to infection/vaccination, the same separation of the groups may be seen due its intrinsic differences — but then, it would clearly not be attributable to the vaccination/ intervention event.
As such, no conclusion can be made about causality of vaccination/infection status on all-cause outcomes, particularly as the base rates of each group are never established. The differences observed may simply reflect pre-existing hidden differences in the group, and not the result of the intervention.
Attrition Bias: In the methods, a censoring rule produces unbalanced attrition that favors the vaccination group. (The “at risk” vaccination group gets healthier over observation period!)
In order to keep the cohorts clean, the authors devise a censoring rule, that drops out a matched pair in each cohort, if [1] the vaccinated got infected, or [2] if the infected got vaccinated:
However, these are NOT equivalent dropout events. As those who are inherently “unhealthier” are more prone to infection (and ED, hospitalizations and deaths), a vaccinated-infected subject dropping out leaves a residual “at risk” population that becomes healthier over the observation period. Conversely, the matched subject in the prior infection arm (reinfected or not) also drops out, serving only to reduce the denominator, and increase the percentage of outcome incidences.
In contradistinction, the censoring of a prior infected subject occurs when they decide to get vaccinated, rather than infected . Vaccination is not a biological event (like getting infected) but a psychosocial one with biases that typically balance out. (An excellent review of biases in decision to vaccinate is given here. )
Thus, the vaccinated group drops out the infected subjects, and all of their subsequent poor outcomes. The prior infected group keeps the reinfected subjects and their poor outcomes, and might drop out healthy vaccinees. The random event of vaccination in the previously infected, does not provide the same selection. As a result, we have selective attrition. All cause hospitalization/death should include events from COVID-19 infection, but it's apparently excluded in the vaccination group yet included in the infection group!
The point is, even if the two groups may have been initially well selected and matched, the attrition from censoring rules serves to make the vaccinated group relatively “healthier”, and biasing outcomes in its favor. The authors could mitigate this criticism by publishing the outcomes of the censored portion of the cohort. Are the negative outcomes for vaccination being selectively hidden in the censored subjects? The burden is on the authors to prove it is not, but they do not.
And, the censoring is massive in this study. In each analysis, nearly 90% are censored by the end of the observation period. This would seem to be an overwhelming amount , when the absolute differences touted between the two groups are only 1-2% (and in some stratifications, <0.05%).
Note, that this attrition bias only applies to the analysis of ED/hospitalization/death outcomes — and not infections/reinfections. In the latter comparison, censoring occurs after the infection is counted — so there is no attrition bias. But, in the former comparison, censoring occurs before ED/hosp/death outcomes are counted (and therefore, it's not counted).
External Validity: A significant number publications suggest that prior infection (if initially survivable) is remarkably protective against future infection, hospitalization and death.
This study generally contradicts the mass of publications and evidence that suggest that if you survive a SARS-CoV-2 infection (>99% likely), then you have significant protection to future infection, hospitalization and death ( 1, 2, 3) . If you accept these studies, then the excess negative all-cause outcomes observed in this study can only derive from ostensibly non-COVID-related causes. Then, the authors want you to believe that these seemingly unrelated hospitalizations and deaths were actually a sequalae of COVID exposure, but only went unrecorded as such, and thus, heretofore have been unlinked. Plausible? The logic seems exceedingly circular.
But again, you cannot make the claim that unrelated negative outcomes were caused by COVID infection, unless you at least establish that same risk did not exist before infection (Critique #2).
Discussion and Conclusions
Based on their analysis, the authors want you to conclude that vaccination is the best option, even if you were previously infected:
“that reliance on natural immunity to avoid negative SARS-CoV-2 health consequences is not a prudent strategy given the safe and readily available vaccines”
However, this is not what their study was even designed to show — the previously infected who got vaccinated were censored from the study, and we have no idea how they actually fare! Thus, if you dismiss all of the above limitations, one can only conclude that a never infected unvaccinated individual should choose vaccination over deliberate infection, in order to prevent negative health consequences (despite the increased risk of actual infection!). But of course, deliberate infection as a protective strategy was never recommended by credible persons anyway. Instead, the debate has been conditional: for those with prior infection, does additional vaccination add any significant benefit? This study does not remotely begin to answer that question.
Furthermore, the scenario presented in the study is generally moot - as the vast majority of individuals have now been infected, vaccinated, boosted or all-three. Thus, the practical purpose of this study is to suggest that initial SARS-CoV-2 exposure can cause serious negative health consequences (e.g. “long COVID”) in the future. Its not a coincidence that the author group is affiliated with Indiana University and the Regenstrief Institute, which have recently been awarded a $9M grant by the CDC, to specifically study “long COVID”. So, in that theme, the authors broadcast these two points:
“As the study indicates, the strong natural immunity acquired from a previous infection does not appear to fully compensate for the detrimental effects of the initial infection.”
“The findings highlight the real-world benefits of vaccination and allude to the health consequences of SARS-CoV-2 after the initial exposure. “
As such, they are attempting to create a basis that hints at the existence of long COVID, using objective but non-specific measures of all-cause ED, hospitalization, and mortality. But as described above, the biases in this study are severe — a suboptimal matching heuristic with selection bias, neglect of baseline outcome rates, and selective attrition.
It is traditional in the “Discussion” section to air out all potential limitations, but here, the authors only generically touch upon “selection bias” as the only potential limit. They propose that the vaccinated may be more “risk-averse”, and therefore, conduct safer lifestyles that avoid negative consequences — but then, they conveniently dismiss this theory, as they found that the vaccinated were more likely to be infected, and thus, assuming risky behaviors. Beyond this tortuous theory, they neglect all other potential theories of selection bias, and other inherent differences between these two groups. Bizarrely, they also add “the consistent findings across different age groups lend credibility to the investigation” — but it could also simply mean that the design was so heavily biased, that all the study’s results are meaningless.
Some in the “natural immunity” camp, have pointed to this study as proof that prior infections are significantly more protective of future COVID-infection. Perhaps, this is most valid part of this study — as some of the above biases do not apply to the same degree within the infection sub-analysis. And this finding would be consistent with other studies, particularly in the Delta and Omicron phases. However, I am of the belief that if a study is fundamentally biased, you cannot cherry-pick only the convenient results. As such, the results of this study, should be rejected on whole.
Generally speaking, this “Indiana” study is similar to the preposterous “traffic crash” study, as both attempted to define two cohorts from a large administrative database, perform some superficial adjustments to account for unavoidable biases, and then, measure distal outcomes that favor a narrative. While the “traffic” study was theoretically absurd (it even found unvaccinated pedestrians were more likely to get hit!), the “Indiana” paper presents a more plausible, if incorrect, causal theory. However, both studies are still fraught with the same conceptual and cognitive biases. In the end, observational studies can allow the naïve to make unfounded conclusions, or the cunning to elicit a desired conclusion. With the decay of the normal scientific “peer review” process, independent appraisal (as done here) is now required.
Taking the limitations into account, at best, this “Indiana” study only demonstates that the authors were successful in identifying two intrinsically different populations with different infection and all-cause outcomes — among many other differences. But, critically, the authors cannot logically attribute any of the those differences to being either vaccinated for SARS-CoV-2 infected.
DISCLAIMER: This article represents my opinion and analysis only, and not of any organization I am affiliated with. It based upon best efforts to compile and analyze the data and evidence. The intended use is for discussion purposes only. It is not a substitute for advice from a personal physician. Please consult your personal physician for health advice.)
Well done!
Seems pretty clear that the authors must have known all of this and still set out to mislead...