Testing pool design optimization
The optimization model determines an optimal testing pool design for prevalence estimation, in terms of the number of testing pools to be utilized,
s, and the size of each testing pool,
n, under a testing budget constraint, so as to minimize the
asymptotic variance of the
maximum likelihood estimator (MLE) of the unknown prevalence rate [
19]:
$$\begin{aligned} \begin{aligned} \underset{n,s}{\text {minimize}}& \, \sigma ^2(n,s;p_0) \\ \text {subject to}& \, c_f \ s + c_v \ s n \le B \\& n \le \overline{N} \\&n, s \in {\mathbb {Z}}^+, \end{aligned} \end{aligned}$$
(7)
where
\(\sigma ^2(n,s;p_0)\) denotes the asymptotic variance of the MLE for a pool design (
n,
s), given an initial estimate of the unknown prevalence rate
p, which we denote by
\(p_0\). The testing cost consists of a fixed testing cost per pool (e.g., cost of the testing kit), denoted by
\(c_f\), and a collection cost per specimen (e.g., cost of drawing blood), denoted by
\(c_v\). The tester has a total testing budget of
B for prevalence estimation. Additionally, the maximum pool size that can be used may be restricted due, for example, to technological constraints, regulations, or other considerations, and we denote the maximum allowable pool size by
\(\overline{N}\). The asymptotic variance is a commonly used criterion for optimal testing design in prevalence estimation and for evaluation of estimators in statistical inference, and is also related to the Fisher’s information (e.g., [
14,
17,
23‐
25,
30,
31]).
In pooled testing, only one test is used on each pool, and the test provides a binary outcome, with a positive outcome indicating the presence of at least one infected specimen in the pool; and a negative outcome indicating that all specimens in the pool are infection-free. Using the test outcomes, the tester derives the MLE of the unknown prevalence rate (
\({\hat{p}}\)). In particular, for a given testing design, (
n,
s), let
\(S_I(s)\) denote the number of positive-testing pools among
s pools, which is a random variable prior to testing. Then, after the testing is conducted and a realization of
\(S_I(s)=k\) is observed, the MLE of the prevalence rate corresponds to the value of
p that maximizes the following likelihood function:
$$\begin{aligned} \begin{aligned} L (p; S_I (s)= k)&= \left( {\begin{array}{c}s\\ k\end{array}}\right) \Big [Sens(n;p) - (1-p)^n (Sens(n;p) + Spec - 1) \Big ]^{k} \Big [ 1 - Sens(n;p) + (1-p)^n (Sens(n;p) + Spec - 1) \Big ]^{s-k} \\&\Rightarrow {\hat{p}} \equiv \underset{p\in (0,1)}{\text {argmax}} \Big \{ L(p; S_I (s) = k) \Big \}. \end{aligned} \end{aligned}$$
(8)
The asymptotic variance function,
\(\sigma ^2(n,s;p)\), for a pool design of (
n,
s), and with respect to the unknown prevalence rate,
p, is then given by (e.g., [
17]):
$$\begin{aligned} \sigma ^2(n,s;p) = \frac{\left\{ Sens (n;p) - (1-p)^n (Sens (n;p) + Spec - 1) \right\} \left\{ 1 - Sens (n;p) + (1-p)^n (Sens (n;p) + Spec - 1) \right\} }{sn^2(1-p)^{2(n-1)} (Sens (n;p) + Spec - 1)^2}. \end{aligned}$$
(9)
Study design and data
Our goal is in this section is to demonstrate the value of the sensitivity estimation methodologies developed in this paper through a numerical study. We do this by designing an optimal testing pool, based on the sensitivity estimates derived for the HIV ULTRIO Plus Assay for various pool sizes using the methodologies described in "
Methods" section; and comparing the efficiency of the
optimal testing design with a
benchmark design that does not consider pooling dilution (hence does not need to use our methodology for sensitivity estimation at various pool sizes). As discussed above, we consider pool design for prevalence estimation of HIV in Sub-Saharan Africa using the HIV ULTRIO Plus Assay.
Model parameters are as follows. We assume that the actual prevalence rate is
\(p = 0.044\) [
29], which is unknown to the tester; this prevalence rate is representative of the HIV prevalence rate in Sub-Saharan Africa. In the absence of this information, the tester determines an initial estimate of
\(p_0 = 0.022\), i.e., we consider the case of undershooting. Based on published data, we consider a fixed testing cost per pool of $31.5 [
15], a collection cost per specimen of $8 [
8], and a total testing budget of $5575 [
18], which corresponds to a testing budget of 50 pools, each of size 10. Finally, we consider a maximum allowable pool size, of
\(\overline{N}=48\) [
21]. These parameter values are for demonstration purposes, and one can conduct similar analyses with different parameter values.
As sensitivity inputs, we utilize the sensitivity values in Table
2, which are derived by the sensitivity estimation model in "
Pooled sensitivity estimation methodology" section, in conjunction with the calibration parameters in "
Calibration and validation" section. The sensitivity values in Table
2 correspond to pool sizes of
\(n=\{1, 2, \ldots , 16\}\). As discussed above, the sensitivity estimation model in "
Pooled sensitivity estimation methodology" section requires the computation of higher dimensional integrals (up to pool size), and can be computationally expensive. Therefore, we use the approximation in "
An approximation for sensitivity estimation" section to derive the sensitivity values for the remaining pool sizes, i.e.,
\(n=\{17, \ldots , 48\}\). Then, we perform a two-dimensional search, over all possible values of
\(\{ (n,s): n \in \{1, \ldots , 48 \}, c_f \ s + c_v \ s n \le B \}\), to determine the optimal testing pool design, i.e.,
\((n^*, s^*)\), for the optimization model in Eq. (
7) that minimizes the asymptotic variance. To determine the “best” benchmark design, we repeat the two-dimensional search, but without considering pooling dilution, that is, by replacing the parameters,
Sens(
n),
\(\forall n \in {\mathbb {Z}}^+\), with
\(99.98\%\), i.e., the sensitivity of individual testing for the HIV ULTRIO Plus Assay; see Table
3 for the resulting optimal design and the benchmark design. For each of these designs, we perform a Monte Carlo simulation to derive estimates for the MLE of
p,
\({\hat{p}}\) (see Eq.(
8)); mean squared error (MSE); and the relative bias (rBias (%)), given by:
$$\begin{aligned} MSE = ({\hat{p}} - p)^2, \text { and } rBias(\%) = 100 \times \left| \frac{{\hat{p}} - p}{p}\right|. \end{aligned}$$
(10)
These performance metrics relate to the efficiency of prevalence estimation, and are commonly used in the literature, e.g., [
13,
14,
30].
Table 3
Estimation efficiency (mean ± half-width of 95% CI) of the optimal design and the benchmark design for HIV prevalence estimation (with an actual prevalence rate of \(p=0.044\))
Pool design |
\(n^* = 37, s^* = 17\)
|
\(n^* = 17, s^* = 33\)
|
\({\hat{p}}\) (MLE) | 0.05204 ± 0.00036 | 0.03041 ± 0.00029 |
MSE \((\times 10^4)\) | 3.95 ± 0.11 | 4.00 ± 0.08 |
rBias (%) | 18.26 ± 0.52 | 30.88 ± 0.48 |
In particular, for each testing design, we perform 10,000 simulation replications. In each replication, we randomly generate the infection status of each of the
\(n^* \times s^*\) specimens, where each specimen carries an infection with probability
p; and is infection-free otherwise; and for each infected specimen, we randomly generate a post-exposure time from a Uniform distribution with support
\([0,\tau ]\), and compute the viral load using Eq. (
1) and the parameters of "
Calibration and validation" section. Then, we randomly assign the specimens into
\(s^*\) pools, each of size
\(n^*\), and generate the binary test outcomes based on the test sensitivity model given in Eq. (
4). Finally, we compute the MLE, MSE, and rBias for each replication using Eqs. (
8) and (
10).