Monophasic decay disease model
The basic reproduction number for the model with monophasic decay is given by [
32,
33]
$$ \mathcal{R}_{0}= \frac{\alpha \pi\kappa\rho\tau N}{\gamma}. $$
(2)
The identifiable combinations of this model have been published elsewhere [
49]. In brief, if case data (corresponding to state
I) are observed, then the observed dynamics are determined by the recovery rate
γ, the average pathogen persistence
τ, and the product
απκρ. The basic reproduction number
\(\mathcal {R}_{0}\), therefore, is structurally identifiable if the population size
N is known.
The parameter combination απκρ can be understood in the following way. The product κρ is the volume of the environment ingested per day, and π is the per pathogen probability of infection. Then, πκρ is the rate of infection-transmitting contact with the environment. The shedding rate (α) and infectious contact rate (πκρ) are in an identifiable parameter combination when we only observe infections in the population (I). In this case, we do not measure the concentration of pathogens in the environment (W), and we do not know whether the force of infection (πκρW) is a result of fewer pathogens that have a higher rate of infection (low shedding α, high rate of infectious contact πκρ) or more pathogens with a lower rate of infection (high shedding α, low rate of infectious contact πκρ). If the concentration of pathogens in the environment is observed, however, the relative sizes of the shedding rate (α) and infectious contact rate (πκρ) are distinguishable. That is, if environmental monitoring data (W) is also available in addition to case data (I), then α and πκρ, not just their product, are separately identifiable.
Biphasic decay disease model
In the biphasic pathogen decay model (Eq. (
1)), pathogens leave their environmental compartment either by decay (
ξi) or by phenotype conversion (
δi), and we denote the probability that conversion occurs before decay by
ϕi:=
δi/(
ξi+
δi). It is easier to interpret the basic reproduction number
\(\mathcal {R}_{0}\) and certain identifiable parameter combinations of this model in terms of
ϕ and
τ rather than
ξ and
δ. The basic reproduction number is
$$ {\begin{aligned} \mathcal{R}_{0}&= \frac{\alpha\kappa\rho N}{\gamma}\left(\pi_{1}\tau_{1}\left(\frac{\eta+(1-\eta)\phi_{2}}{1-\phi_{1}\phi_{2}}\right)\right. \\& \quad \left.+ \pi_{2}\tau_{2}\left(\frac{(1-\eta)+\eta \phi_{1}}{1-\phi_{1}\phi_{2}}\right)\right). \end{aligned}} $$
(3)
The calculations are left to the Additional file
1. This system
\(\mathcal {R}_{0}\) can be seen as the sum of two submodel basic reproduction numbers that give the contributions of the labile (
\(\mathcal {R}_{0,1}\)) and persistent (
\(\mathcal {R}_{0,2}\)) phenotypes to the overall basic reproduction number.
$$ {\begin{aligned} \mathcal{R}_{0}&= \left(\frac{\alpha\pi_{1}\kappa\rho\tau_{1} N}{\gamma}\right) \left(\frac{\eta+(1-\eta)\phi_{2}}{1-\phi_{1}\phi_{2}}\right)\\&+\left(\frac{\alpha\pi_{2}\kappa\rho\tau_{2} N}{\gamma}\right)\left(\frac{(1-\eta)+\eta \phi_{1}}{1-\phi_{1}\phi_{2}}\right), \end{aligned}} $$
(4)
$$ {\begin{aligned} \quad&\,=:\mathcal{R}_{0,1}+\mathcal{R}_{0,2}. \end{aligned}} $$
(5)
These two submodels are each similar in form to the monophasic basic reproduction number (Eq. (
2)), with a coefficient that accounts for the interconnectedness of the two compartments. Of the
α pathogens shed between the two compartments,
αη go directly to the labile compartment, but
α(1−
η)
ϕ2 will also come to the labile compartment via the persistent compartment. These two pathogen sources explain the numerator of the interconnectedness coefficient, i.e.,
η+(1−
η)
ϕ2. Next, because pathogens can move back and forth between compartments, we need to know the expected number of visits a pathogen makes to the labile compartment [
46]. After the initial visit, each return visit happens with probability
ϕ1ϕ2. Thus, the expected amount of time spent in the labile compartment is
$$ \tau_{1}(1+\phi_{1}\phi_{2}+(\phi_{1}\phi_{2})^{2}+\cdots+(\phi_{1}\phi_{2})^{n}+\cdots)= \frac{\tau_{1}}{1-\phi_{1}\phi_{2}}. $$
(6)
This term explains the denominator of the interconnectedness coefficient.
Because these submodel reproduction numbers allow us to understand the relative contribution of each pathogen phenotype to the overall epidemic potential of the system, we would like to be able to determine their values from time-series data. In particular, we want to understand the risk potential in the less infectious, persistent fraction of pathogens. However, it is not clear a priori whether we can determine these risk potentials from time-series data alone, and so we need identifiability analysis to determine the identifiable parameter combinations for the biphasic decay model and to inform what data are required to provide useful information from the model. Mathematical computation and details are left to the Additional files
1,
2,
3,
4 and
5.
If we only have human disease surveillance time-series (case data,
I), then the observed dynamics are determined by
$$\alpha(\eta\pi_{1}+(1-\eta)\pi_{2})\kappa\rho,$$
$$\xi_{1}+\delta_{1}+\xi_{2}+\delta_{2}=\frac{\tau_{1}+\tau_{2}}{\tau_{1}\tau_{2}},$$
$$(\xi_{1}+\delta_{1})(\xi_{2}+\delta_{2})-\delta_{1}\delta_{2}=\frac{1-\phi_{1}\phi_{2}}{\tau_{1}\tau_{2}},$$
In this case, the disease recovery rate
γ is identifiable as it was in the monophasic decay model. The combination
α(
ηπ1+(1−
η)
π2)
κρ in the biphasic model has an analogous interpretation to that of the combination
απκρ in the monophasic model. The two identifiable parameter combinations
ξ1+
δ1+
ξ2+
δ2 and (
ξ1+
δ1)(
ξ2+
δ2)−
δ1δ2 come directly from the underlying biphasic pathogen decay model previously described in [
18]; they are the sum and product of the apparent labile and persistent decay rates. These values characterize the observed pathogen decay, but they cannot attribute the values to the underlying processes, i.e., the same observed patterns could be generated by different values of the decay and phenotypic conversion parameters. Finally, because
\(\mathcal {R}_{0}/N\) is identifiable from case data, the basic reproduction number can be estimated if the population size is known.
Human disease surveillance provides us with information about five quantities, but, since there are eleven parameters, we can see that additional data will be needed to make inference about specific biological parameters, the persistence–infectivity trade-off, and the phenotype-specific risk. Quantitative microbial risk assessors often collect environmental samples to inform exposure estimates; such data could also be used to inform transmission models. Many quantification methods are either culture-based (which may not capture a dormant persistent phenotype) or PCR-based (which will not distinguish between phenotypes). If we combine case data with high-quality time-series environmental surveillance of the total pathogen population (
W=
W1+
W2), such as might be collected to inform quantitative microbial risk assessment, we can additionally estimate
$$\frac{\tau_{1}(\eta+(1-\eta)\phi_{2})+\tau_{2}((1-\eta)+\eta \phi_{1})}{\tau_{1}\tau_{2}}.$$
By observing both case and environmental data, we can estimate the average shedding rate per volume
α. This second parameter combination is less directly interpretable but could prove useful in estimating some biological parameters if others are known experimentally.
The underlying system dynamics are sufficiently complicated so that patterns of time-series case data and environmental surveillance, even though useful for characterizing the overall risk, do not fully reveal the biological processes or implications of the persistence–infectivity trade-off. However, if we have a way to estimate the relative abundance of the labile (W1) and persistent (W2) pathogen phenotypes in our environmental samples, we gain a great deal of parametric information. We can separately estimate γ,α,η,δ1,δ2,ξ1,ξ2,κρπ1, and κρπ2 (proof in supplementary material), at least in theory (there may be practical barriers for real-world data). With this information, we can infer the risk potentials of the labile and persistent phenotypes (\(\mathcal {R}_{0,1}\) and \(\mathcal {R}_{0,2}\)).
These results suggest that there are two scientific strategies for understanding the underlying biological mechanisms and the persistence–infectivity trade-offs in this system. First, with high-quality case and environmental data that can distinguish between pathogen phenotypes, we can indirectly infer many of the biological parameter values. Second, if we cannot distinguish between pathogen phenotypes, usual disease environmental surveillance can be combined with targeted experimental studies designed to independently determine certain model parameters. Here, the important parameters to identify are the infectivity of pathogens in the persistent state (π2), the rates of entering dormancy (δ1) and of resuscitation (δ2), and the fraction of pathogens already dormant when initially shed into the environment (1−η)); identifying these parameters is essential to understanding the relative risks associated with the labile and persistent pathogen phenotypes. In particular, determining that one or more of these parameters is negligibly small provides a means to simplify the modeling framework. For example, if we can determine that resuscitation does not occur in the environment to an appreciable extent (δ2≈0), then the identifiable quantities from case data simplify to γ, α(ηπ1+(1−η)π2)κρ, ξ1+δ1,ξ2, and \(\mathcal {R}_{0}/N\). The addition of environmental surveillance data helps to identify α and (after a little bit of algebra) ηξ1+(1−η)ξ2. Although determining that this one parameter δ2 is negligible would not fully resolve the persistence–infectivity question, it would simplify the remaining quantities and, consequently, future analysis. The specific experiments needed to estimate these parameters will likely vary by pathogen. Broadly speaking, however, animal challenge studies could be used to estimate π1 and π2, analysis of stool samples could be used to estimate η, and techniques designed to measure microbial dormancy could be harnessed to begin to better understand δ1 and δ2.
These scientific strategies are not mutually exclusive, and pursuing both population quantification and parameter determination strategies simultaneously will allow for corroboration and maximize our confidence in the conclusions of individual experimental studies because theoretical identifiability does not guarantee that real-world data will contain sufficient information to distinguish the mechanistic parameters in practice.