Being data driven into a ditch
(Originally posted for Carolina Journal on June 4th, 2020 here.)
Written by Paul F. Cwik and Abir Mandal
Governors across the nation announced that the coronavirus-related policies for closing businesses were based on “data driven” analyses by medical professionals. Next, they announced that the reopening phases also would be strictly “data driven.” Over and over, the officials said that they were being guided by “the science” and “the data.” Of course, being guided by science and data is appropriate in a time of crisis; we wouldn’t want it any other way.
However, what if the decision makers were getting only a small fraction of the overall picture? This is not to say what they had was wrong. The information was most likely the best available. Our question is, “What is the likelihood that good decisions can be made if only a small part of the overall picture is considered?” It would be like the chance a blind man has in guessing the weight of an elephant by only touching its trunk.
From the start, officials have been looking at incomplete data. The key statistics that a data driven analysis would need to have is the number of COVID-19 infections, the number of people who are hospitalized by COVID-19, and the number of deaths caused by the virus. If we had instantaneous data of those three variables, then creating an appropriate response would be a straightforward process. Unfortunately, data of this sort never actually occurs.
Taking the wrong path
Where did we go wrong? To get perfectly accurate results would require health care workers to test everyone. Unfortunately, we simply do not have enough tests. When we cannot test the entire population, we take a sample and extrapolate results. In essence, we create a model. Models require simplifying assumptions.
The first hurdle we needed to overcome was the issue that people may be infected and yet asymptomatic. As a result, health care workers had no way of knowing who to test. Since COVID-19 is a novel virus, for which our testing capacity has been and is likely still constrained, the next step would have been to test random people.
Unfortunately, medical necessity and proper statistical methods do not always line up. Medical workers needed to know if the patient in front of them was a risk to others and with a limited supply of tests (especially in March 2020) tests were restricted only to those who were symptomatic. The nonserious and asymptomatic cases were left out. Thus, the data that we were collecting was skewed from the very beginning. This sort of error is called sample selection bias.
Sample selection bias is where the data points of the test sample is not gathered in a random process. As we are observing now, making deductions and deriving estimates based upon biased data is misleading and can lead to disastrous consequences. In fact, it is precisely this bias that has led to the assumption of the death rate being between a range as wide as 0.5% and 16%, as calculated as a proportion of the total number of people tested positive for COVID-19. This estimate depends on the number of people tested positive, which in turn depends on the testing capacity of the country — hardly consistent across the world.
Governments around the world and in North Carolina have based their projections using such biased figures, implying that the disease was many-fold deadlier than the seasonal flu (which has about a 0.1% mortality rate). Unfortunately, this assumption should never have been taken as accurate, because the sample of people tested did not accurately reflect the population of those actually infected.
The rates of infection were unknown at the beginning. But estimates could have been roughly “ballparked” using the lab-derived figures for rates of infection and the empirical multiplier used each year by the CDC to estimate the annual flu load from confirmed cases. Policy makers, who were mostly led by a team of health experts, chose not to pause and do so. Therefore, the projected death rates are likely to be too high by a factor of 50 to 100 times, as now evidenced by the serology tests on the general population which test for COVID-19 antibodies.
Consequences of poor understanding
The overall result was massively inaccurate projections and apocalyptic scenarios. The number of infected people was projected from biased data. Using the number of people infected as the base, the projections of the number of ventilators needed and resulting deaths were grossly exaggerated. A statistician could have helped matters, in our opinion, by highlighting the dangers of conflating the case fatality rate with the overall mortality rate. The unfortunate result was that flawed models, which predicted between 500,000 deaths with social distancing completely implemented, and 2.2 million deaths if nothing were done in the United States, were touted as scientific truth.
The data that has now been released to the public show that these projections are clearly flawed. Furthermore, many government officials, including Gov. Roy Cooper, have simply refused to release the data and models used in making their executive orders. (See here and here.) When looking at more recent numbers, the death rate and hospitalization rates are likely not significantly different than that of an average or bad flu season.
It seems that government officials continue to use the inflated metrics to determine whether, for example, North Carolina should open. Additionally, the debate has shifted from “flattening the curve” to “stopping the spread.” Again, looking at the spread of the virus is also falling into the trap of sample selection bias. Today health departments are looking at the proportion of positive cases, which on the face of it sounds like a reasonable number at which to look. As the number of tests increase, even given a constant number of infections in a community, the number of positive tests would increase.
However, this is where the trap of sampling bias occurs. The tests are still predominantly performed on those who are sick enough to seek testing. People who feel fine (and are not at risk) are not going out of their way to get testing. The collected results do not constitute a true representation of the state’s population and shows nothing about whether the disease’s spread in the community is increasing or decreasing. The only reasonable metric that the state should use is the number of hospitalizations due to COVID-19 like diseases.
Where to look
In our opinion, North Carolina officials should focus on the number of serious hospitalizations (as imperfect as it may be) as the primary metric for its policy making. However, we should not be myopic and only focus on one statistic.
Always, the goal is to use the data properly. Let’s consider the following scenario. Suppose that there is an outbreak of COVID-19 cases in Wake County, what should the government do? Should the entire state be shut down? Or more to the point, should we close Graham or Hyde counties if there is a spike in Wake County?
It is upon these questions that we see science and the law come together. When a political area engages in a lockdown, it is purposefully suppressing the citizen’s legal rights. Recently judges have been rolling back executive overreach by claiming that the restrictions of rights must be of the greatest concern. When rights are to be violated, it must be done in a manner that is targeted and not expansive, it must be short-term and not perpetual, and it must be done under scrutiny of the other governmental branches.
The science is required to assist the law by showing the least oppressive limits of a lockdown. The best statistic to start with is how is the most likely to die. Then who is the most at risk of suffering severe problems. Stemming from these we come to the number of serious hospitalizations. The capacity of hospitals is a limit that cannot be crossed. We have seen the results in Europe when people are denied beds or are “overflow” in hallways because this limit is crossed. Many needlessly suffer. The U.S. goal from the beginning has been to “flatten the curve.” Which curve? The curve of serious hospitalizations.
Setting a better policy
When focusing on serious hospitalizations, government officials at the local and county levels can look at the stress on the area’s hospitals and compare it to the area’s hospital capacity. There are significant differences between regional areas. For example, there are no hospitals in Hyde County but there are 10 in Wake County. Wake County has much more capacity than Hyde County, but it also has a much larger population. If there are 10 cases in Hyde County, a lockdown may be required. However, if there are 10 cases in Wake, a lockdown could be excessive. Using the data in this manner requires policies to be focused. Our concern is the overreach across the entire state.
Furthermore, there is no evidence that statewide lockdowns work. South Dakota did not lock down. Their numbers are no worse than states with the worst encroachments on the freedoms of movement of citizens. Sweden did not lock down. Its death rate of around 330 per million due to COVID-19 is slightly higher than the U.S.’s 295 per million. Sweden’s economy is projected to contract by 5.6%, but not as bad as the rest of Europe at -8.1%. When North Carolina began Phase Two on May 23, the state reported a “surge” in cases. However, this surge of 1,107 cases is an aggregate number of people who have tested positive and is based on a record-setting 26,000 tests. In terms of the number of cases tested positive as a proportion of total tests, the figure for that day is just 6.9%, lower than the dataset average of 7%. Additionally, there is no mention if these cases are in a single county, spread across the whole state, or in areas that have hospital capacity.
A better path
The largest consequence of this statistical illiteracy on the part of American policy makers is that we have essentially destroyed our economy. The irony is that antibodies and herd immunity, either via infection and recovery or gained through a vaccine, are the key to defeating the virus. Keeping ourselves locked up in isolation from each other would not really save lives because the virus is here to stay. Isolation and quarantining are only prolonging our misery. If statewide lockdown measures were not put in place, and instead we chose to protect the most vulnerable, the virus would spread throughout the population, harmlessly for most, while generating antibodies and herd immunity.
The very fact that a spike in the number of cases as our testing capacity increased did not correspond to a similar spike in deaths should have given our politicians pause. Government officials, like all people, are very reluctant to admit that they were wrong. The result of this stubbornness is an overreaching and illogical lockdown that continues today. We need to account for sample selection bias, meaning that we should not focus simply on the number of cases. For example, NC Department of Health and Human Services reports that the plurality of positive tested cases (43%) are for people between the ages 25 and 49. However, 64% of the deaths are 75+ years old. The probability of someone younger than 45 succumbing to the disease is so low, that it can be taken as zero.
Does it make sense to quarantine the people who are in their prime working age range? When we more closely examine the governor’s executive orders, we see that restaurants can open but not bars. Day camps are allowed to open, but not playgrounds. Salons can open but not gyms. For all the calls for data and science, Governor Cooper seems to have regressed to whimsy. Yes, precautions for the most vulnerable need to be taken, but it is past time for our state’s economy to be reopened. If we fail to open soon, it will be as President Trump mentioned: The cure for COVID-19 in North Carolina will turn out to be much worse than the disease itself.
Paul F. Cwik is the BB&T Professor of Economics and Finance at the University of Mount Olive.
Abir Mandal is an assistant professor of economics at the University of Mount Olive.