Introduction
Bayesian data analysis
Bayesian inference
Hierarchical models
Markov chain Monte Carlo (MCMC)
Bayesian hierarchical regression on clearance rates
The bhrcr package
clearanceEstimatorBayes
, which will be described thoroughly later on. This function returns the WWARN PCE estimates as well as the estimates from the Bayesian hierarchical model. The calculatePCE
function, which provides only the WWARN PCE estimates of the clearance rates, has been incorporated in the package as well. The generic summary
, print
, and plot
functions, as well as the diagnostics
function, will also be illustrated by examples in the following subsections.
The Pursat
data
-
Sex
: A factor variable with two levelsF
andM
-
agegroup
: 21+ (21 years of age or older), or 21− (younger than 21 years) -
vvkv
: whether or not an individual was from Veal Veng or Kranvanh -
HbE
: the number of alleles of Haemoglobin E variant -
athal
: the number of alleles of thalassaemia variant -
g6pd
: the number of alleles of G6PD deficient variant -
lnPf0
: Log initial parasite density -
year2010
:TRUE
if 2010,FALSE
if 2009 -
group
: 1 if parasite group 1, 0 if parasite group 2
data(“pursat”)
and data(“pursat_covariates”)
to access the data sets.The clearanceEstimatorBayes
function
clearanceEstimatorBayes
function is the principal function in the bhrcr package that analyzes the input data set in the Bayesian framework presented in the previous section, and provides the posterior distributions of the parameters, along with point estimates and credible intervals. The arguments of the clearanceEstimatorBayes
function as well as their default values and the major components of the function output are explained below:-
data
: a data frame, with no missing values, containing clearance profiles of patients. This data frame must containid
,time
, andcount
columns, in that order. The first column represents the IDs of patients (not necessarily integers). The second and third columns contain time and recorded parasitaemia (per microlitre) for each of the measurements, respectively.data
is allowed to have the predicted WWARN PCE estimates stored in another column namedPredicted
. Ifdata
doesn’t have thePredicted
column,clearanceEstimatorBayes
will automatically calculate and provide the WWARN PCE rates. In this case it is strongly recommended to setoutlier.detect = TRUE
. Otherwise, the WWARN PCE outlier detection would not be executed by the program and the provided WWARN PCE rates would be inconsistent with the estimates generated by the online tool. -
covariates
: a data frame (with no missing values), ordered according to patients’ order indata
, containing individual level covariates. This argument may beNULL
, in which case estimation of clearance rates is of primary interest. -
seed
: an optional user-specified number used to initialize a pseudorandom number generator, with a default value of 1234. Theseed
argument helps users to reproduce their results. -
detect.limit
: detection limit of the parasite density in blood (parasites per microlitre). The default value is 40. -
outlier.detect
: indicator of whether or not to use the WWARN PCE outlier detection method [8]. The default value isTRUE
and for the reasons stated before, it is recommended to setoutlier.detect = TRUE
ifdata
is missing thePredicted
column. -
conf.level
: required confidence level for reporting estimates’ credible intervals, with a default value of 0.95. -
niteration
: total number of simulations after the burn-in period, with a default value of 100,000. -
burnin
: length of the burn-in period. The default value is 500. -
thin
: step size of the thinning process. The default value is 50. -
filename
: the name of the csv file used to store some output elements. This csv file, which is named “output.csv” by default, containsid
,clearance.mean
,lag.median
, andtail.median
.
-
clearance.post
: a matrix of posterior samples for clearance rates \(\{ \beta _i \}\). -
clearance.mean
: mean values of the clearance rates’ posterior distributions. -
clearance.median
: median values of the clearance rates’ posterior distributions. -
gamma.post
: a matrix of posterior samples for each element in \(\gamma\). -
gamma.mean
: mean values of the \(\gamma\)’s posterior distributions. -
gamma.median
: median values of the \(\gamma\)’s posterior distributions. -
gamma.CI
: credible intervals for each element in \(\gamma\). -
halflifeslope.post
: a matrix of posterior samples for the effect of covariates on log half-lives. The half-life value is calculated as \(\log (2)/\text {(clearance rate)}\). Thus, even though the method originally regressed log clearance rates rather than log half-lives on the covariates, one can obtain the slopes for a regression of the log half-lives by using \(\log\, (\text {half-life}) = \log \log (2) - \log \,(\text {clearance rate})\). -
halflifeslope.mean
: mean values of the posterior distribution for the effect of covariates on log half-lives. -
halflifeslope.median
: median values of the posterior distribution for the effect of covariates on log half-lives. -
halflifeslope.CI
: credible intervals for the effect of covariates on log half-lives. -
changelag.post
: posterior samples of changepoints between lag and decay phases, \(\{ \delta _i^{\ell } \}\). -
lag.median
: median values of the posterior distributions of \(\{ \delta _i^{\ell } \}\). -
changetail.post
: posterior samples of changepoints between decay and tail phases, \(\{ \delta _i^\tau \}\). -
tail.median
: median values of the posterior distributions of \(\{ \delta _i^\tau \}\). -
predicted.pce
: WWARN PCE estimates of the parasite clearance rates.
clearanceEstimatorBayes
man page in the bhrcr package for the full list.The summary
and print
functions
summary
function produces comprehensive and compressed output information based on the results from the main function, clearanceEstimatorBayes
. To further illustrate this point, the built-in data sets of bhrcr package, pursat
and pursat_covariates
are used to provide a fast example. It may take significant time to run the code, depending on one’s computer’s hardware. Here a small number of iterations is used for tutorial purpose. If the reader wants to obtain stationary results from the simulation, please consider a larger number of iterations. Details will be explained later in the diagnostics
function section.seed
argument is set to be 1234. The output given by summary
includes a table containing the posterior mean and median of the regression coefficients which represent the impact of covariates on log parasite clearance rates and also on the corresponding log half-life values, along with the 95% credible intervals. If the input data set does not contain WWARN PCE estimates, the clearanceEstimatorBayes
function will automatically generate a folder called “PceEstimates” under your current working directory to store calculated WWARN PCE estimates for each individual.summary
function, one can perform an analysis of the covariates of interest. As discussed in Section 4 of [10], one point of interest was whether or not there is evidence of resistance to artemisinins developing over time. Thus the indicator variable year2010TRUE
for the year of data collection was included. According to the results above, the parasite clearance half-life increased over time (positive mean and median) however this effect is not significant since its 95% credible interval contains zero.SexM
), age 21 or greater (agegroup21+
) and living in the Kravanh or Veal Veng districts (vvkvTRUE
) which are close to forested regions (see [9]). Notice the slope of 0.1648 on the indicator variable SexM
for males which means that parasite clearance half-life is estimated to be longer in male patients than in female patients, other factors in the model being held equal, by a factor of \(e^{0.1648}\approx 1.179\). But one should be careful about the interpretation here because this is an observational study and there may be unmeasured confounders. The causal interpretation of each covariate is not straightforward and more or less speculative. The reader can refer to [16] for some details.print
function is essentially the same. It only displays the posterior mean of the effect of covariates on both log clearance rates and log half-lives. Therefore, for a quick and straightforward summary of the estimated impact of covariates, the print
function is recommended.The diagnostics
function
diagnostics
function provides diagnostic analysis such as trace plots, ACF (auto-correlation function) and PACF (partial auto-correlation function) plots for some important parameters in the MCMC process of the Gibbs sampling. These diagnostic plots help to assess whether it is plausible that the MCMC process has reached stationarity and has been thinned sufficiently (see [17, 18])../mcmcDiagnostics
”.outlier.detect = FALSE
when they are running the main function clearanceEstimatorBayes
. If the outliers are determined to be likely due to transcription errors, then the outlying data points should be deleted;clearanceEstimatorBayes
) with various lengths and observe the trace plots, ACF plots (explained later), which helps determine the suitable burn-in period. Make sure the final sample is collected after the Markov chain reaches stationarity, i.e. the distribution of the values after the burn-in ends should be similar to the values at the middle and end of the chain. For the current version of bhrcr
, parallelization is not supported so that users have to run one chain at a time;slowExample
), which has been saved into a dataset called posterior.rda
and incorporated into the bhrcr package. To see the results, run the slow example in the demo:The plot
function and posterior analysis
plot
function visualizes the results returned by the clearanceEstimatorBayes
function. All plots will be saved under “./plots
”. The previous example is used as follows.“1”, “3”, “14”, “35”
. Here the ID numbers are stored as string/characters instead of numeric integers. This allows for general use of extracting specific patients in terms of given IDs such as names or bar code sequence etc.id = 1
,id 1
(see Fig. 6a), the posterior mean clearance rate was 0.1076, the median was 0.1080 with a 95% credible interval of [0.1005, 0.1167]. For this patient, one can check the posterior distribution of the time of the changepoint between the lag and decay phases:plot
function: the mean (coefficient) curve, the median (coefficient) curve, the posterior median curve and the point-wise 95% credible intervals of the posterior samples. In Fig. 6a, b,plot
function to give users more flexibility to choose what they prefer.