Results
Can the increase in confirmed cases be explained by the number of tests conducted?
The argument, that the increase in registered SARS-COV-2 infections would mainly be driven by a mere increase in the number of test performed has been repeatedly used by against political countermeasures . There is a continuous debate about correct interpretation of rising and falling case numbers in view of varying test activity, yet without any substantial data provided on that aspect so far.
Unfortunately, neither the RKI nor any other institution in Germany are in the position to report on the exact number of tests conducted behind the daily number of laboratory-confirmed COVID 19 cases. The RKI publishes the number of tests and the rate of positive results reported by laboratories participating in their surveillance system on a weekly basis. However, the number of reporting labs was not constant over time, since obligatory reporting was introduced with some delay. In addition, not every test corresponds to a new patient tested, since multiple testing of e.g. persons under quarantine are included. Still, the data available can be used to estimate the overall testing activity in Germany, assuming that the surveillance labs have been selected by the RKI for representativity. The weekly test data were adjusted for the number of labs reporting and divided by the number of days to obtain test rates on a per-day-basis.
A linear regression model was built with test activity as explaining and number of confirmed cases as dependent variable. The model yielded an adjusted R square of .07 (F = 7.70 at 89 DF, p = .007), meaning a numerically weak, but still statistically significant effect.
Can changes in the number of confirmed cases be explained by changes in public mobility?
It has been observed by various parties that observed infection rates in Germany already started to decline before implementation of lockdown measures. This is typically explained with the assumption that many people were concerned and voluntarily followed the public appeals to avoid unnecessary movements in public long before the official lockdown was announced. In this context, the “COVID 19 mobility project” was initiated to measure public mobility based on mobile phone data, which should serve as an indicator for compliance with the social distancing policy. The project had met with some initial resistance and concerns regarding data protection, but it was considered as important by the RKI.
Data on public mobility as transcribed from the website were included in the linear model described above as a second predictor for number of confirmed cases. The effect was impressive, the combined model yielded an adjusted R squared of .25 (F = 15.04 at 88 DF, p = .000002), with a highly significant effect for mobility, while the “number of test” effect was no longer visible. However, what seems to prove that mobility reduction prevented infections at first sight is quite surprising in fact. The outcome implies a strong correlation of the number of confirmed cases, aggregated by date of first symptoms, with mobility on the same(!) day. This is not to be expected given an incubation period of about 5 to 6 days . To further investigate if there is also a predictive value for the number of confirmed cases at a later point in time, correlation coefficients were calculated with a time lag increasing from 0 to 8 days. As it can be seen in Figure 1, the correlation coefficients show a linear decrease with increasing time lag.
With that, the data do not support the idea, that changes in public mobility impact the number of confirmed COVID 19 cases. The correlations rather suggest the opposite, it might be the daily communication of rising or falling infection rates, that had an impact on peoples’ compliance with the “stay at home” requests. If that would be the case, it might be expected that the mobility index correlates stronger with the number of cases based on date of report than with the number based on date of first symptoms. And indeed, the correlation was found to be -.51 for the former and -.68 for the latter. The correlation between mobility and the number of confirmed case five days later, the estimated mean incubation time, was just -.26.
Do changes in the number of newly confirmed COVID 19 cases per day occur in a plausible relationship to public health measures taken by the German government?
Residuals derived from the linear regression model calculated in section 2 were used for further analyses, thus eliminating the influence of varying test activity. The residuals were submitted to a timeline decomposition on calendar week basis, using a standard procedure of the analysis software package. Figure 2 shows the results for trend (i.e. the 7 days moving average), seasonal trend (i.e. the observed regular fluctuations per weekday) and the remaining error or random component. The random component indicates some more dynamic developments between calendar week 12 and 16 and, at the lower end, around calendar week 21/22. The seasonal curve shows a pronounced periodicity, which is probably owing to less testing activity on weekends and delayed reporting, even though the effect is reduced by using the date of disease onset instead of date of reporting. The moving average over 7 days clearly accounts for most of the variation.
For a more detailed inspection the trend curve was isolated in Figure 3 with line markers added for landmark decisions taken by the German Government. The timepoints for major interventions in March (“a” to “c”) were chosen in alignment with the modeling studies by the MPI mentioned earlier, timepoints “d” (first opening steps on April 20) and “e” (law on extended testing for asymptomatic patients became effective, May 22) were added on top. The timepoints “a” to “d” were marked with a 5 days lag from actual date of intervention to account for the estimated incubation time, as these were assumed to have a direct impact on infections. Timepoint “e” was marked with the date, the law took effect, since testing by itself should not influence infection risk but might impact the chance of detecting cases.
The curve shows a steep and remarkably smooth increase which starts to bend slightly before line mark “a”, the day when the first measures – cancelling of large public events on March 09 - would be supposed to take effect. After a peak at the beginning of calendar week 12 it starts to decline, without any visible effect of line marker “b”, the closure of schools and nurseries (March 16). Between calendar week 14 and 16, i.e. for approximately two weeks after line mark “c”, the full implementation of lock down measures in Germany on March 23, the curve shows a slightly accelerated decline and returns to the previous trend thereafter. There is no discernible change at line marker “d”, the day when Germany started to alleviate some of the lockdown measures, while the curve shows a small peak that corresponds to the date, when the updated testing policy took effect, allowing for more tests of asymptomatic patients under certain conditions.
The only political measure implemented in a plausible temporal relationship to a turning point of the curve was the decision to cancel large public events on March 09. None of the following decisions on strengthening or loosening restrictions show a significant impact, even though there are two weeks when the numbers went down a bit faster. This probably reflects the true effect of the shutdown measures, which would not be sustainable and not in a reasonable relationship to their dramatic economic and social impact. Interestingly, what seems to have caused an immediate reaction was a slight modification of the national COVID 19 test strategy put into effect on May 22. The potential bias brought in by the test strategy will be topic of the next section.
What further sources of bias impede valid conclusions from the number of confirmed cases per day?
Figure 3 still shows an exponential increase in confirmed cases during the first weeks, even after the effect of varying test activity has been eliminated. However, it remains questionable if this observed increase truly reflects the spread of infection in the German population. Typical model simulation studies that were trying to estimate the effect of our measures on number of infected and number of deaths have two implicit assumptions:
“patient zero” was correctly identified, i.e. it is known when the virus hit the country.
the number of confirmed cases is always linearly related to the total number of infected patients.
Both assumptions may need to be challenged in the light of today’s knowledge. While the first cases in Europe have been officially confirmed in January this year, genetic analyses, sewage water assays and retrospective analyses of frozen blood samples provide more and more evidence that the virus might have circulated at least in France and Italy much earlier, at least mid-December, maybe November, or still earlier . The authors of the respective papers already indicated that this could imply some bias in the assumptions regarding disease progression, but they did not expand on that and for what I know nobody has picked the topic up.
In addition, there is increasing and consistent evidence from Germany and all over the world, that the number of unreported cases is at least tenfold higher compared to the laboratory confirmed cases. We have learned that a much larger proportion of cases than expected – estimates vary between 40% and 85% remain completely asymptomatic or at best with signs of a common cold, while the RKI, at least in earlier days, had expected almost every infected person to become symptomatic, sooner or later.
For most of the time covered in this paper the official recommendations on PCR testing remained unchanged. Only patients with clinical symptoms like cough or fever should be tested if they had been in contact with a confirmed case before. Until they were showing symptoms, contact persons identified as suspects by the local health authorities were kept under closer surveillance.
If this test strategy is implemented at the beginning of an outbreak the number of cases identified will indeed give an idea of what happens in the population, even if some asymptomatic cases are missed out. The number of unreported cases will be in linear relationship to the number of confirmed. However, what happens if testing starts while there already is a small but substantial share of infected patients in the region where the first case is detected, e.g. some 2 to 3 percent? This is a realistic scenario, estimates from one of the first German hotspots have even been in the region of 15% .
The trigger case could be someone ending up on ICU with severe pneumonia. Following up on contacts will yield an average of 30 to 40 persons under closer surveillance . This already gives a reasonable chance to find the next infected, who just might have his slight cold and normally would not have been tested. Another 40 contacts under surveillance, another one, an ever-growing group of suspects and 2 to 3 cases per hundred detected, since this is prevalence rate assumed. This pattern of confirmed cases begetting confirmed cases constitutes a classic exponential increase, but it does not involve mutual infection. It rather represents a sort of calibration curve for the test strategy. After a very short time – some two or three weeks –the point would be reached when the detectable share of patients has been identified – those with symptoms at a certain point in time – and only from this time on the observations truly reflect the increase or decrease of infection rates in the wider population. Still this does not mean that new confirmed cases have been infected by previously confirmed cases - they all might have caught their virus from various asymptomatic spreaders who never showed up in the statistics.
This is not the only potential bias in public surveillance date, the other one is a negative sampling bias affecting the perceived risk of severe course of disease and death. Basically, there are two ways to get confirmed as a COVID 19 patient, either as a contact of a confirmed case as described above, or as a patient hospitalized for severe respiratory syndrome. With that the group of confirmed cases represents a mix of negative selection (patients with symptoms) and a highly negative selection (patients requiring hospitalization). Indeed, this can also be seen directly from the data. One of the graphs provided within the daily situation update published by the RKI shows how the age groups are represented within the number of confirmed cases compared to the age distribution in total population. What can be seen is a more or less evenly distribution between 20 and 59 years, the working population, and a tendential decrease in numbers for the Younger and Older, which fits to the results showing children to be more resilient, while the older often have reduced social contacts and might thus be less exposed to potential infections. But then patients older than 70 years are clearly over-represented compared to all other age groups. About 17% of all confirmed COVID-cases belong to this group and they account for 85% of all deaths reported. Interestingly, the percentage of hospitalized patients has likewise been in the region 17% for most of the time.
This clearly looks like a bimodal distribution and it could be interesting to analyze the way how these 17% hospitalized patients have been identified. My prediction would be that a large share of those patients was not detected by contact tracing, but they were delivered directly to hospital with severe pneumonia and then tested. The high-risk group should not just be viewed as a share of all COVID 19 patients but as a distinct sub-population with specific features – one of them obviously high age – which makes them vulnerable for a severe course of disease. Any backward conclusion from the currently registered COVID patients to the general risk in an elderly population is meaningless since it is unknown how many of them – despite advanced age – do not have any severe consequences. We observe a number x of patients who require hospitalization and a number y of patients who do not, but there does not seem to be a real connection between those groups. With the current massive expansion of test activities, we observe a continuously decreasing hospitalization and death rate, which is in line with this assumption.