Data description
This study used the Centre of the AIDS Programme of Research in South Africa (CAPRISA) 002 Acute Infection (AI) Study data conducted at the Doris Duke Medical Research Institute (DDMRI) at the Nelson R Mandela School of Medicine of the University of KwaZulu-Natal (UKZN) in Durban, South Africa [
29‐
33]. CAPRISA started the CAPRISA 002 AI study between August 2004 and May 2005 by enrolling women who are at high risk of HIV infection for follow-up with an intense on-going examination to help estimate HIV infection rates within the study, including providing intense aftercare advice to those dropping out prematurely, the careful follow-up to study disease progression, and CD4 count and viral load evolution [
29‐
33]. Detail description of the design, development, and procedures of the CAPRISA 002 AI study population can be found here [
29,
30].
When an infected person’s body indicates symptoms of being incapable of adequately controlling the virus and their CD4 count drops below a specific cut point, they were initiated on therapy. A deficient level of CD4 count causes the weak immune system of an HIV-infected person. In the absence of treatment or without viral suppression, the person is susceptible to opportunistic infections (OIs). This increases the risk of the new and ongoing Coronavirus Disease 2019 (COVID-19) infections and underlying illnesses [
31‐
33]. HAART is an effective way of preventing these infections and diseases. By suppressing and preventing the virus from making copies of itself, HAART aims to decelerate or prevent the progression to AIDS and loss of life for HIV-infected people. The body’s immune system is less damaged, and HIV infection complications are decreased when the level of the virus in the blood is low or “undetectable” through HAART [
31‐
33]. This is also significantly reducing the likelihood of transmitting HIV to partners.
The HIV/AIDS epidemic and other sexually transmitted diseases severely impact human health, especially the well-being of women and young girls [
31‐
33]. “The consequences of HIV/AIDS stretch beyond women’s health to their part as moms and caregivers and their commitment to their families’ economic support. The social, development, and health consequences of HIV/AIDS and other sexually transmitted illnesses ought to be considered from a gender perspective” [
34‐
36]. Apart from sex-specific issues, HIV therapy algorithms for women are similar to that of men [
31]. The interaction between the clinician and the changing HIV epidemiology will provide the clinician with a technique to identify patients at high risk of HIV infection and clarify which rules should be applied to avoid sequential HIV transmission [
31‐
33]. Although ART suggestions are the same for all patients, the study of CD4 count of HIV-infected patients, in conjunction with individual differences, will help clinicians to get through and interpret potential information precisely due to patient specific-specific effects [
31,
33,
37‐
39].
Quantile mixed-effects model
Quantile regression (QR) is an advanced statistical technique to study the predictors’ heterogeneous effects at the conditional distribution of the outcome. Instead of modeling only the mean value like the conventional regression methods, quantile regression enables more fully to explore the data by modeling the conditional quantiles, for example, the 5th and 95th percentiles of the response distribution [
33]. For these reasons, it has become more prevalent in several epidemiological and economics studies. For instance, Yirga et al. [
40] studied how children’s BMI varies with age and other factors using quantile regression. There are several other applications of quantile regression based on uncorrelated data, among which public health, bioinformatics, health care, environmental science, ecology, microarray data analysis, and survival data analysis [
13,
41‐
51].
The quantile level is frequently signified by the Greek letter
\(\tau\), and the conditional quantile of
\(y\) given
\(x\) is often written as
\({Q}_{\tau }(y|x)\). The quantile level
\(\tau\) is the probability
\(\mathrm{Pr}[y\le {Q}_{\tau }(y|x)]\), and it is the value of
\(y\) below which the proportion of the conditional response population is
\(\tau\). For a random variable
\(y\) with a probability distribution function
\(F\left(y\right)=Pr\left(Y\le y\right)\), the τ quantile of
\(y\) is defined as the inverse function
\(Q\left(\uptau \right)=inf\left\{y:F(y)\ge\uptau \right\}\),
\(\uptau \upepsilon (0, 1)\). Particularly, the median is
\(Q\left(0.5\right)\). Let
\({y}_{i}\) denote a scalar response variable with conditional cumulative distribution function
\({F}_{{y}_{i}}\), whose shape is unspecified and
\({{\varvec{x}}}_{i}\) the corresponding covariates vector of dimension
\(k\times 1\) for subject
\(i, i=1,\dots ,n\). Then, following Koenker and Basset (1978), the
\(\tau \mathrm{th}(0<\tau <1)\) quantile regression modeled is written as
\({Q}_{\tau }\left({y}_{i}|{{\varvec{x}}}_{{\varvec{i}}}\right)={{\varvec{x}}}_{{\varvec{i}}}^{\boldsymbol{^{\prime}}}{{\varvec{\beta}}}_{{\varvec{\tau}}}\), where
\({Q}_{\tau }\left({y}_{i}|{{\varvec{x}}}_{{\varvec{i}}}\right)\equiv {F}_{{y}_{i}}^{-1}\left(\bullet \right)\), which is the quantile function (or the inverse cumulative distribution function) of
\({y}_{i}\) given
\({{\varvec{x}}}_{{\varvec{i}}}\) estimated at
\(\tau\), and
\({{\varvec{\beta}}}_{\tau }\) is a column vector of regression parameters corresponding to the
\(\tau \mathrm{th}\) quantile. On the other hand, this expression can be written as
$${Q}_{\tau }\left(y|{{\varvec{x}}}_{{\varvec{i}}}\right)={{\varvec{x}}}_{{\varvec{i}}}^{\mathbf{^{\prime}}}{{\varvec{\beta}}}_{{\varvec{\tau}}}+{\varepsilon }_{i},\mathrm{ with }\,{Q}_{{\varepsilon }_{i}}\left(\tau |{{\varvec{x}}}_{i}\right)=0,$$
(1)
where
\({\varepsilon }_{i}\) is the error term whose distribution (with density
\({f}_{\tau }\left(\bullet \right)\)) is restricted to have the
\(\tau \mathrm{th}\) quantile to be zero, that is,
\({\int }_{-\infty }^{0}{f}_{\tau }\left({\varepsilon }_{i}\right)d{\varepsilon }_{i}=\tau\) [
24,
52]. “The error density
\({f}_{\tau }\left(\bullet \right)\) is often left unspecified in the classical literature” [
52]. Thus, the estimator
\({\widehat{{\varvec{\beta}}}}_{\tau }\) proceeds through
linear programming (LP) by minimizing
$${\widehat{{\varvec{\beta}}}}_{\tau }=\underset{\mathit{\beta \epsilon }{R}^{P}}{\mathrm{argmin}}{\sum }_{i=1}^{n}{\rho }_{\tau }({y}_{i}-{{\varvec{x}}}_{{\varvec{i}}}^{\mathbf{^{\prime}}}{{\varvec{\beta}}}_{{\varvec{\tau}}}),$$
(2)
where
\({\rho }_{\tau }\left(\bullet \right)\) is the so called loss (or check) function defined by
\({\rho }_{\tau }\left(u\right)=u\left(\tau -I\left\{u<0\right\}\right)\) with
\(u\) being a real number and
\(I\left\{\bullet \right\}\) is the indicator function. Thus,
\({\widehat{{\varvec{\beta}}}}_{\tau }\) is called the
\(\tau \mathrm{th}\) quantile regression estimate [
5,
13,
43,
53]. The parameter
\({{\varvec{\beta}}}_{\tau }\) and its estimator
\({\widehat{{\varvec{\beta}}}}_{\tau }\) depends on the quantile
\(\tau\), because of different choices of
\(\tau\) estimate different values of
\(\beta\) [
24]. For this reason, the interpretation of
\({{\varvec{\beta}}}_{\tau }\) is specific to the quantile being estimated, the intercept term denotes the baseline predicted value of the response at specific quantile
\(\tau\), while each coefficient can be interpreted as the rate of change of the
\(\tau \mathrm{th}\) response quantile per unit change in the value of the corresponding predictor variable (
ith regressor) keeping all the other covariates constant.
The objective function of the conditional quantile estimator,
\({\widehat{{\varvec{\beta}}}}_{{\varvec{\tau}}}\), in Eq. (
2) proceeds by minimizing
$$\begin{aligned}H\left({{\varvec{\beta}}}_{{\varvec{\tau}}}\right) & =\sum_{i}\tau \left|{\varepsilon }_{i}\right|+\sum_{i}\left(1-\tau \right)\left|{\varepsilon }_{i}\right|\\&=\sum_{i:{y}_{i}\ge {{\varvec{x}}}_{{\varvec{i}}}^{\boldsymbol{^{\prime}}}{{\varvec{\beta}}}_{{\varvec{\tau}}}}^{n}\tau |{y}_{i}-{{\varvec{x}}}_{{\varvec{i}}}^{\boldsymbol{^{\prime}}}{{\varvec{\beta}}}_{{\varvec{\tau}}}|+\sum_{i:{y}_{i}<{{\varvec{x}}}_{{\varvec{i}}}^{\boldsymbol{^{\prime}}}{{\varvec{\beta}}}_{{\varvec{\tau}}}}^{n}(1-\tau )|{y}_{i}-{{\varvec{x}}}_{{\varvec{i}}}^{\boldsymbol{^{\prime}}}{{\varvec{\beta}}}_{{\varvec{\tau}}}|,0< \tau <1 \end{aligned}$$
(3)
where
\(i:{y}_{i}\ge {{\varvec{x}}}_{{\varvec{i}}}^{\boldsymbol{^{\prime}}}{\varvec{\beta}}\) for under prediction, and
\(i:{y}_{i}<{{\varvec{x}}}_{{\varvec{i}}}^{\boldsymbol{^{\prime}}}{\varvec{\beta}}\) for overprediction [
5]. Since the above objective function is nondifferentiable, the gradient optimization methods are not applicable; instead,
LP methods can be used to obtain
\(H({{\varvec{\beta}}}_{{\varvec{\tau}}})\) [
41,
54,
55]. For more details and a summary of quantile regression, see, for example, Davino et al. [
3], Konker and Basset [
7], Konker [
13], Buchinsky [
41], Koenker and Hallock [
43], or Yu et al. [
49].
As the check function
\(({\rho }_{\tau }\left(\bullet \right))\) in Eq. (
2) is not differentiable at zero, we cannot extract specific solutions to the minimization problem. Hence,
LP procedures are often used to achieve a relatively fast computation of
\(H({{\varvec{\beta}}}_{{\varvec{\tau}}})\) [
52,
56]. A natural link between minimization of the quantile check function and ML theory is given by the assumption that the error term in Eq. (
1) follows an ALD [
53,
57]. A connection between the minimization of the sum in Eq. (
2) and the ML theory is provided by ALD [
58]. Other forms of Laplace distribution were summarized by Kotz et al. [
59] and Kozubowski and Nadarajah [
60]. ALD that is closely associated with the loss function for QR has been examined in several works of literature [
19,
24,
52,
57,
58].
The conventional QR is based on the median, or other quantile levels, by assuming a continuous or Gaussian distribution. QR has been extended to count regression, which is a special case of the discrete variable model [
55,
56,
61‐
64]. However, the distribution function of a discrete random variable is not continuous, and the objective function of the conditional quantile
\({Q}_{\tau }(y|{\varvec{x}})\) for a discrete distribution cannot be a continuous function of
\({\varvec{x}}\) such as
\(\mathrm{exp}({{\varvec{x}}}^{\boldsymbol{^{\prime}}}{\varvec{\beta}})\) [
61]. Machado and Silva [
64] overcome this restriction by developing a continuous random variable whose quantiles have a one-to-one relation with the quantiles of
\(y\), a count variable. When count data consists of severe outliers or multiple distributional components that do not reflect a known underlying probability distribution, quantile count models may be a useful alternative. Furthermore, QR models all of the quantiles of the discrete distribution and covers the entire range of counts [
62]. Detailed discussions about quantile count models for independent data are available in Winkelmann [
61], Machado and Silva [
64], Hilbe [
62,
63], Cameron and Trivedi [
55,
56], and a recent application of this model can be found in Winkelmann [
65] and Miranda [
66].
Mixed-effects models characterize an ordinary and conventional type of regression methods used to examine data coming from longitudinal studies. The general linear mixed-effects model is defined as
$${{\varvec{Y}}}_{i}={{\varvec{X}}}_{i}^{^{\prime}}{\varvec{\beta}}+{{\varvec{Z}}}_{i}^{^{\prime}}{{\varvec{u}}}_{i}+{\varepsilon }_{ij},\quad i=1,\dots ,n,\quad j=1,\dots ,{n}_{i},$$
where
\({{\varvec{Y}}}_{i}\) is the
\({n}_{i}\times 1\) vector of the response variable,
\({{\varvec{X}}}_{i}^{^{\prime}}\) is a known
\({n}_{i}\times p\) design matrix that includes covariates for the fixed effects,
\({\varvec{\beta}}\) is
\(p\times 1\) vector of population-averaged fixed-effects,
\({{\varvec{Z}}}_{i}^{^{\prime}}\) with the dimension of
\({n}_{i}\times r\) known design matrix for random effects,
\({{\varvec{u}}}_{i}\) is
\(r\times 1\) vector of random effects,
\({{\varvec{u}}}_{i}\sim N\left(0, {\boldsymbol{\Sigma }}_{u}\right),\) and
\({\varepsilon }_{ij}\) is the independent and identically distributed random errors,
\({\varepsilon }_{ij}\sim N(0,{\sigma }^{2})\). Thus, the
\(\tau \mathrm{th}\) quantile linear mixed-effects model, which were developed by Geraci and Bottai [
20] as an extension of the QR model with a random intercept of Geraci and Bottai [
19], of a continuous response
\({{\varvec{Y}}}_{i}\), has the form
$${Q}_{\tau }\left({y}_{ij}|{{\varvec{x}}}_{ij},{\boldsymbol{ }{\varvec{u}}}_{i}\right)={{\varvec{x}}}_{ij}^{^{\prime}}{{\varvec{\beta}}}_{\tau }+{{\varvec{z}}}_{ij}^{^{\prime}}{{\varvec{u}}}_{i}+{\varepsilon }_{\tau ,ij}, 0<\tau <1$$
(4)
where
\({y}_{ij}\) is the response of subject
\(i\) at
\(j\)th measurement,
\({{\varvec{x}}}_{ij}\) indicates covariate vector of
\(i\)th subject at
\(j\)th measurement for fixed effects,
\({{\varvec{z}}}_{ij}\) indicates covariate vector of
\(i\)th subject at
\(j\)th measurement for the random effects
\({{\varvec{u}}}_{i}\), and random errors
\({\varepsilon }_{\tau ,ij}\sim ALD(0,\sigma ,\tau )\), which are also dependent on
\(\tau\).
\({{\varvec{\beta}}}_{\tau }\) is the coefficient of fixed-effects corresponding to the
\(\tau \mathrm{th}\) quantile, and the response variable
\({y}_{ij}\), conditional on
\({{\varvec{x}}}_{ij}\),
\({{\varvec{u}}}_{i}\), for
\(i=1,\dots ,n, j=1,\dots ,{n}_{i}\) and
\(\sigma\) are assumed to be conditionally independently distributed as ALD with the density given by
$$f\left({y}_{ij}|{{\varvec{x}}}_{ij},{{\varvec{u}}}_{i}, \sigma \right)=\frac{\tau \left(1-\tau \right)}{\sigma }exp\left\{-{\rho }_{\tau }\left(\frac{{y}_{ij}-{{\varvec{x}}}_{ij}^{\mathrm{^{\prime}}}{{\varvec{\beta}}}_{\tau }-{{\varvec{z}}}_{ij}^{\mathrm{^{\prime}}}{{\varvec{u}}}_{i}}{\sigma }\right)\right\}.$$
(5)
The random effects (
\({{\varvec{u}}}_{i}\)’s) are assumed to be distributed as
\({{\varvec{u}}}_{i}\stackrel{iid}{\sim }{N}_{r}\left(0,\boldsymbol{\Psi }\right)\), where the dispersion matrix
\(\boldsymbol{\Psi }=\boldsymbol{\Psi }(\boldsymbol{\alpha })\) relies on unknown and reduced parameters
\(\boldsymbol{\alpha }\), which is the distinct elements of
\(\boldsymbol{\Psi }\), and the random errors
\({\varepsilon }_{ij}\sim ALD(0,\sigma )\) [
18,
52]. Then a likelihood for
\({y}_{ij}\) at
\(\tau \mathrm{th}\) quantile is
$$L\left({{\varvec{\beta}}}_{\tau },\sigma ,\tau \right)=\frac{{\tau }^{n}{\left(1-\tau \right)}^{n}}{{\sigma }^{n}}exp\left\{-\sum_{i=1}^{n}{\sum }_{j=1}^{{n}_{i}}{\rho }_{\tau }\left(\frac{{y}_{ij}-{{\varvec{x}}}_{ij}^{^{\prime}}{{\varvec{\beta}}}_{\tau }-{{\varvec{z}}}_{ij}^{^{\prime}}{{\varvec{u}}}_{i}}{\sigma }\right)\right\}$$
(6)
Based on the likelihood of conditional quantile of
\({y}_{ij}\), it is suggested that the maximization of the likelihood in Eq. (
5) with respect to the parameter
\({{\varvec{\beta}}}_{\tau }\) is equivalent to the minimization of the loss function in Eq. (
7). Thus, we can estimate the coefficient of fixed-effects corresponding to the
\(\tau \mathrm{th}\) quantile (
\({{\varvec{\beta}}}_{\tau }\)) by minimizing the objective function of Eq. (
6), which can be expressed as
$${H}^{*}({{\varvec{\beta}}}_{{\varvec{\tau}}})=\underset{{{\varvec{\beta}}}_{\tau }}{\mathrm{min}}{\sum }_{i=1}^{n}{\sum }_{j=1}^{{n}_{i}}{\rho }_{\tau }\left(\frac{{y}_{ij}-{{\varvec{x}}}_{ij}^{^{\prime}}{{\varvec{\beta}}}_{\tau }-{{\varvec{z}}}_{ij}^{^{\prime}}{{\varvec{u}}}_{i}}{\sigma }\right)$$
(7)
More details regarding the estimation process of quantile mixed-effects models are available here [
18,
19,
24,
58].
Stochastic approximation of the expectation maximization
The study examines quantile regression for linear mixed-effects models (QR-LMM) of Galarza [
18] that follows the SAEM algorithm for determining exact ML estimates of the fixed-effects and the general variance–covariance matrix
\({\boldsymbol{\Sigma }}_{\tau }=\boldsymbol{\Sigma }\left({{\varvec{\theta}}}_{{\varvec{\tau}}}\right)\) of the random effects parameters for the specific quantile. The Expectation–Maximization algorithm, also known as the EM algorithm, which was suggested by Dempster et al. [
67], is a popular technique for iterative computation of ML estimates when the observations are regarded as incomplete data, which incorporates the ordinary or standard elements of missing data; however, it is much broader than that [
68]. There are two steps in every iteration of the EM algorithm: an expectation, or E-step, followed by a maximization (M-step). “In the former action, the incomplete data are estimated given the observed data and current estimate of the model parameters under the assumption of missing at random (MAR) for the incomplete data. In the later step, the likelihood function is maximized under the assumption that the incomplete/missing data is known” [
67]. The detailed explanations of these processes, their related analytical clarifications for successively more common sorts of models, and the basic theory underlying the EM algorithm are given by Dempster et al. [
67]. A book devoted entirely to the general formulation of the EM algorithm and its basic properties and applications has been provided by McLachlan and Krishnan [
68]. Moreover, the success of the EM algorithm is well documented and can be found in numerous statistical literature.
Even though the EM algorithm is popular, Delyon et al. [
69] pointed out that, in some situations, it is not applicable due to the fact that the E-step cannot be carried out in a closed-form. To deal with these issues, Delyon et al. [
69] presented a simulation-based SAEM algorithm based on stochastic approximation (SA) as an elective to the MCEM, standing for Monte Carlo EM. “While the MCEM requires a consistent increment of the simulated data and regularly a substantial number of simulations, the SAEM versions guarantee convergence with a fixed and/or small simulation size” [
69‐
71]. The SAEM algorithm restores the E-step of the EM algorithm by one iteration of a stochastic (probabilistic) approximation procedure, whereas the M-step is consistent [
71]. The E- and M-steps of the EM and SAEM procedures are highlighted as follows.
Let
\({\mathcal{l}}_{o}\widehat{(\uptheta })=\mathrm{log}f({Y}_{obs};\uptheta )\) denotes the maximization of log-likelihood function based on the observed data
\(({Y}_{obs})\), and given
\(q\) represents missing data,
\({Y}_{com}=({Y}_{obs}, q){^{\prime}}\) denotes the complete data with observed and missing data, thus
\({\mathcal{l}}_{c}({Y}_{com};\uptheta )\) be the complete log-likelihood function, and
\({\widehat{\uptheta }}_{k}\) indicates the evaluation of
\(\uptheta\) at the
\(k\)th iteration. Then the EM algorithm with missing data that maximizes
\({\mathcal{l}}_{c}\left({Y}_{com};\uptheta \right)=\mathrm{log}f({Y}_{obs}, q;\uptheta )\) iteratively and converges to a stationary point of the observed likelihood under mild regularity conditions [
18,
71], go through in two steps:
-
E-step: Consists computing of the conditional expectation of
\({\mathcal{l}}_{c}({Y}_{com};\uptheta )\).
$$S\left(\uptheta |{\widehat{\uptheta }}_{k}\right)=E\left\{{\mathcal{l}}_{c}\left({Y}_{com};\uptheta \right)|{Y}_{obs}, {\widehat{\uptheta }}_{k}\right\}$$
-
M-step: Computes the parameter values \({\widehat{\uptheta }}_{k+1}\) by maximizing \(S\left(\uptheta |{\widehat{\uptheta }}_{k}\right)\) with respect to \(\uptheta\).
The SAEM algorithm, on the other hand replaces the E-step by stochastic approximation, presented by Galarza [
18] summarized as follows:
-
Simulation (E-step): Generate \(q({\mathcal{l}}_{o}, k)\) sample (simulation of the missing data at iteration \(k\)), \(\mathcal{l}=\mathrm{1,2},\dots , m\), from the conditional distribution of the missing data \(f\left(q|{\uptheta }_{k-1}, {Y}_{obs}\right)\).
-
Stochastic approximation: Update
\(S\left(\uptheta |{\widehat{\uptheta }}_{k}\right)\) according to
$$S\left(\uptheta |{\widehat{\uptheta }}_{k}\right)=S\left(\uptheta |{\widehat{\uptheta }}_{k-1}\right)+{\delta }_{k}\left[\frac{1}{m}\sum_{\mathcal{l}=1}^{m}{\mathcal{l}}_{c}\left({Y}_{obs},q\left({\mathcal{l}}_{o}, k\right)|{\widehat{\uptheta }}_{k};\uptheta \right)-S\left(\uptheta |{\widehat{\uptheta }}_{k-1}\right)\right]$$
-
M-step: Maximize
\({\widehat{\uptheta }}_{k}\) according to
$${\widehat{\uptheta }}_{k+1}=\underset{\uptheta }{\mathrm{argmax}}S\left(\uptheta |{\widehat{\uptheta }}_{k}\right),$$
this is equivalent to finding
\({\widehat{\uptheta }}_{k+1} \upepsilon {\varvec{\Theta}}\) such that
\(S\left({\widehat{\uptheta }}_{k+1}\right)\ge S\left({\widehat{\uptheta }}_{k}\right)\)∀
\(\uptheta \upepsilon {\varvec{\Theta}}\), where
\({\delta }_{k}\) is a smoothing parameter (a sequence of decreasing non-negative numbers) as given by Kuhan and Lavielle [
72,
73], and
\(m\) is the number of simulations suggested to be less than or equal to 20 [
18]. The choice of
\({\delta }_{k}\) recommended by Galarza [
18] is given as follows:
$${\delta }_{k}=\left\{\begin{array}{l}1\, for\, 1\le k\le cW\\ \frac{1}{k-cW}\, for\, cW+1\le k\le W ,\end{array}\right.$$
where
\(c \upepsilon (0, 1)\) is a cut point that regulates the percentage of initial iterations with no memory, and
\(W\) is the maximum number of iterations.
For more points of interest, however, see Jank [
70], Meza et al. [
71], or Kuhn and Lavielle [
72,
73]. Furthermore, details of these algorithms for estimating the parameters of the QR-LMM are presented by Galarza [
18] and Galarza et al. [
21]. “The SAEM algorithm has proven to be more effective for computing the ML estimates in mixed-effects models due to the reusing of simulations from one iteration to the next in the smoothing phase of the algorithm” [
18,
71‐
73]. The SAEM algorithm is employed in the R package
qrLMM.