Appendix: Estimating test-attributed COVID-19 deaths
Let P(i) be the probability of being infected during an epidemic wave at any given time point i, P the total cumulative probability of being infected during the epidemic wave, T the total duration of the epidemic wave, t the time period during which a person infected with the virus would test positive, d the diagnostic time window used in death attribution (a death is attributed to the virus if it occurs within d time of a positive test), m(i) the population mortality rate (death from any cause) per unit time at time (i), m the average population mortality rate (death from any cause) per unit time, and S the size of the population of interest.
Initially, one may consider the simplified version where P(i) and m(i) remain steady during the epidemic wave and ask what would happen if one were to test all people at the time of their death. Then the probability of testing positive for the virus at the time of death (for any death and for any cause thereof) during the T + t period is given by
$${\text{D}} = {\text{Pt}}/\left( {{\text{T}} + {\text{t}}} \right)$$
(1)
For example, if P = 60% of the population is infected, t = 0.07 years, and T + t = 1 year, then D = 0.6 × 0.07/1 = 0.042, i.e. 4.2% of people dying will test positive for the virus, if they happen to be tested at the time of their death, regardless of whether the virus is causally related to the death or is an innocent bystander. The equation is assuming t is substantially shorter than T so that the effect of the initial phase where no full t length “look back” is possible, can be neglected.
One may then ask what would happen if all people were tested continuously during the epidemic wave, not only at the time of their death. Then, the probability of testing positive for the virus either at the time of death or at any time point during the diagnostic time window that would lead to an attribution of the death to the virus is given by
$${\text{D}} = {\text{P}}\left( {{\text{t}} + {\text{d}}} \right)/\left( {{\text{T}} + {\text{t}}} \right)$$
(2)
For example, if P = 60% of the population is infected, t = 0.07 year, T + t = 1 year, and a death is attributed to the virus if it occurs within d = 0.08 year, then D = 0.6 × (0.07 + 0.08)/1 = 0.09, i.e. 9% of people dying of any cause will be attributed to the virus, if the attribution is done based on the testing alone and without considering any other (e.g. clinical/pathology) information. Then the total number of deaths attributed to the virus during the period T + t with be
$${\text{N}}_{{{\text{test}} - {\text{attributed}}}} = {\text{DSm}}\left( {{\text{T}} + {\text{t}}} \right) = \left[ {{\text{P}}\left( {{\text{t}} + {\text{d}}} \right)/\left( {{\text{T}} + {\text{t}}} \right)} \right]{\text{Sm}}\left( {{\text{T}} + {\text{t}}} \right) = {\text{P}}\left( {{\text{t}} + {\text{d}}} \right){\text{Sm}}$$
(3)
Moreover, in real circumstances the probability of being infected during the epidemic wave is not steady over time and even the population mortality rate varies over time, e.g. due to seasonality or excess deaths imposed from various causes. Therefore, while (t + d)S can still be considered a constant (t and d are fixed/defined and the population S does not change substantially over time), the product Pm in (Eq.
3) would be more properly replaced by the integral
\(\int {P(i)m(i)di}\) for values of i from 0 to t + T. This integral is larger than Pm when P(i) and m(i) are synchronized in their variation (P(i) is higher when m(i) is higher) and it is smaller than Pm when P(i) and m(i) are desynchronized in their variation (P(i) is higher when m(i) is lower). It is far more likely that P(i) and m(i) would be synchronized, because infections are more common in winter months, when there is also higher mortality rate in the population. Therefore, one may increase Pm by multiplying with a correction factor c to capture more properly the integral of
\(\int {P(i)m(i)di}\).
The likely values of c are close to 1. For example, let us consider an annual mortality wave m(i) described for parsimony by a sine function with three scenarios where the peak is (a) 25%, (b) 33.3%, or (c) 50% above the mean (and the trough is correspondingly 25%, 33.3% or 50% below the mean). Illustratively, in scenario (a), mortality may be on the annual average levels on April 15 and October 15, 25% higher than the annual average on January 15, and 25% lower than the annual average on July 15. Let us also consider synchronized P(i) variation with P(i) reaching double the mean at the peak of the mortality wave and reaching 0 at the trough of the mortality wave. E.g. in scenario (a), P(i) reaches its peak on January 15, it is 0 on July 15, and it has values on April 15 and on October 15 that are half the peak value of January 15. Then, c is 1.12, 1.16, and 1.25 in these three scenarios, respectively, depending on how peaked the m(i) variation is. With this correction, (Eq.
3) becomes:
$${\text{N}}_{{{\text{test}} - {\text{attributed}}}} = {\text{cP}}\left( {{\text{t}} + {\text{d}}} \right){\text{Sm}}$$
(4)