Background
joineRML
. A simulation analysis and real-world data example are used to demonstrate the accuracy of the algorithm and the software, respectively.Implementation
Model
Estimation
Likelihood
MCEM algorithm
Initial values
lme()
and coxph()
from the R packages nlme
[26] and survival
[27], respectively. From the fitted models, the best linear unbiased predictions (BLUPs) of the separate model random effects are used to estimate each \(W_{1i}^{(k)}(t)\) function. These estimates are then included as time-varying covariates in a Cox regression model, alongside any other fixed effect covariates, which can be straightforwardly fitted using standard software. In the situation that the data are not balanced, i.e. when t
ijk
≠t
ij
∀k, then we fit a standard Cox proportional hazards regression model to estimate γ
v
and set γ
yk
=0∀k.Convergence and stopping rules
JM
[28] implements (7) (in combination with another rule based on relative change in the likelihood), whereas the R package joineR
[29] implements (8). The relative difference might be unstable about parameters near zero that are subject to MC error. Therefore, the convergence criterion for each parameter might be chosen separately at each EM iteration based on whether the absolute magnitude is below or above some threshold. A similar approach is adopted in the EM algorithms employed by the software package SAS [30, p. 330].Standard error estimation
Software
joineRML
, which is available on the The Comprehensive R Archive Network (CRAN) (https://CRAN.R-project.org/package=joineRML). The principal function in joineRML
is mjoint()
. The primary arguments for implementing mjoint()
are summarised in Table 1. To achieve computationally efficiency, parts of the MCEM algorithm in joineRML
are coded in C++ using the Armadillo linear algebra library and integrated using the R package RcppArmadillo
[37].
mjoint()
function in the R package joineRML
Argument | Description |
---|---|
formLongFixed
| a list of formulae for the fixed effects component of each longitudinal outcome. The left hand-hand side defines the response, and the right-hand side specifies the fixed effect terms. |
formLongRandom
| a list of one-sided formulae specifying the model for the random effects effects of each longitudinal outcome. |
formSurv
| a formula specifying the proportional hazards regression model (not including the latent association structure). |
data
| a list of data.frame objects for each longitudinal outcome in which to interpret the variables named in the formLongFixed and formLongRandom . The list structure enables one to include multiple longitudinal outcomes with different measurement protocols. If the multiple longitudinal outcomes are measured at the same time points for each patient (i.e. t
ijk
=t
ij
∀k), then a single data.frame object can be given instead of a list . It is assumed that each data frame is in long format. |
survData
| (optional) a data.frame in which to interpret the variables named in the formSurv . If survData is not given, then mjoint() looks for the time-to-event data in data . |
timeVar
| a character string indicating the time variable in the linear mixed effects model. |
inits
| (optional) a list of initial values for some or all of the parameters estimated in the model. |
control
| (optional) a list of control parameters. These allow for the control of ε0, ε1, and ε2 in (7) and (8); the choice of N, δ, and convergence criteria; the maximum number of MCEM iterations, and the minimum number of MCEM iterations during burn-in. Additionally, the control argument gammaOpt can be used to specify whether a one-step Newton-Raphson (=~NR~ ) or Gauss-Newton-like (=~GN~ ) update should be used for the M-step update of γ. |
mjoint()
function returns an object of class mjoint
. By default, approximate SE estimates are calculated using the empirical information matrix. If one wishes to use bootstrap standard error estimates, then the user can pass the model object to the bootSE()
function. Several generic functions (or rather, S3 methods) can also be applied to mjoint
objects, as described in Table 2. These generic functions include common methods, for example coef()
, which extracts the model coefficients; ranef()
, which extracts the BLUPs (and optional standard errors); and resid()
, which extracts the residuals from the linear mixed sub-model. The intention of these functions is to have a common syntax with standard R packages for linear mixed models [26] and survival analysis [27]. Additionally, plotting capabilities are included in joineRML
. These include trace plots for assessment of convergence of the MCEM algorithm, and caterpillar plots for subject-specific random effects (Table 2).
mjoint
aFunction(s) | Returns |
---|---|
logLik , AIC , BIC | the log-likelihood, Akaike information criterion and Bayesian information criterion statistics, respectively |
coef , fixef | the fixed effects parameter estimates |
ranef
| the BLUPs (and optional standard errors) |
print a, summary c | short and long model summary outputs, respectively |
fitted , resid | the fitted values and raw residuals from the multivariate LMM sub-model, respectively |
plot
b
| the MCEM algorithm convergence trace plots |
sigma
| the residual standard errors from the LMM sub-model |
vcov
| the variance-covariance matrix of the main parameters of the fitted model (except the baseline hazard) |
getVarCov
| the random effects variance-covariance matrix |
confint
| the confidence intervals based on asymptotic normality |
update
| specific parts of a fitted model can be updated, e.g. by adding or removing terms from a sub-model, and then re-fitted |
sampleData
| sample data (with or without replacement) from a joint model |
simData()
that allows for simulation of data from joint models with multiple longitudinal outcomes. joineRML
can also fit univariate joint models, however in this case we would currently recommend that the R packages joineR
[29], JM
[28], or frailtypack
[38] are used, which are optimized for the univariate case and exploits Gaussian quadrature. In addition, these packages allow for extensions to more complex cases; for example, competing risks [28, 29] and recurrent events [38].Results
Simulation analysis
joineRML
package by means of the simData()
function. The true parameter values and results from 500 simulations are shown in Table 3. In particular, we display the mean estimate, the bias, the empirical SE (= the standard deviation of the the parameter estimates); the mean SE (= the mean SE of each parameter calculated for each fitted model); the mean square error (MSE), and the coverage. The results confirm that the model fitting algorithm generally performs well.
Parameter | True value | Mean estimated value | Empirical SE | Mean SE | Bias | MSE | Coverage |
---|---|---|---|---|---|---|---|
D
11
| 0.2500 | 0.2411 | 0.0435 | — | −0.0089 | 0.0020 | — |
D
21
| 0.0000 | 0.0010 | 0.0136 | — | 0.0010 | 0.0002 | — |
D
31
| −0.1250 | −0.1212 | 0.0295 | — | 0.0038 | 0.0009 | — |
D
41
| 0.0000 | −0.0006 | 0.0127 | — | −0.0006 | 0.0002 | — |
D
22
| 0.0400 | 0.0396 | 0.0072 | — | −0.0004 | 0.0001 | — |
D
32
| 0.0000 | −0.0002 | 0.0138 | — | −0.0002 | 0.0002 | — |
D
42
| 0.0000 | −0.0001 | 0.0055 | — | −0.0001 | 0.0000 | — |
D
33
| 0.2500 | 0.2420 | 0.0400 | — | −0.0080 | 0.0017 | — |
D
43
| 0.0000 | 0.0007 | 0.0134 | — | 0.0007 | 0.0002 | — |
D
44
| 0.0400 | 0.0399 | 0.0075 | — | −0.0001 | 0.0001 | — |
β
0,1
| 0.0000 | 0.0028 | 0.0612 | 0.0660 | 0.0028 | 0.0038 | 0.9660 |
β
1,1
| 1.0000 | 1.0012 | 0.0218 | 0.0229 | 0.0012 | 0.0005 | 0.9500 |
β
2,1
| 1.0000 | 1.0010 | 0.0449 | 0.0470 | 0.0010 | 0.0020 | 0.9540 |
β
3,1
| 1.0000 | 0.9932 | 0.0897 | 0.0925 | −0.0068 | 0.0081 | 0.9440 |
\(\sigma _{1}^{2}\)
| 0.2500 | 0.2506 | 0.0165 | 0.0171 | 0.0006 | 0.0003 | 0.9560 |
β
0,2
| 0.0000 | −0.0026 | 0.0637 | 0.0655 | −0.0026 | 0.0041 | 0.9660 |
β
1,2
| −1.0000 | −1.0011 | 0.0229 | 0.0223 | −0.0011 | 0.0005 | 0.9480 |
β
2,2
| 0.0000 | 0.0008 | 0.0399 | 0.0472 | 0.0008 | 0.0016 | 0.9700 |
β
3,2
| 0.5000 | 0.5061 | 0.0894 | 0.0923 | 0.0061 | 0.0080 | 0.9540 |
\(\sigma _{2}^{2}\)
| 0.2500 | 0.2501 | 0.0162 | 0.0171 | 0.0001 | 0.0003 | 0.9540 |
γ
v1
| 0.0000 | 0.0011 | 0.1243 | 0.1392 | 0.0011 | 0.0155 | 0.9720 |
γ
v2
| 1.0000 | 1.0487 | 0.2837 | 0.2750 | 0.0487 | 0.0829 | 0.9340 |
γ
y1
| −0.5000 | −0.5121 | 0.1936 | 0.2084 | −0.0121 | 0.0376 | 0.9560 |
γ
y2
| 1.0000 | 1.0311 | 0.2220 | 0.2145 | 0.0311 | 0.0502 | 0.9400 |
Example
serBilir
in the model and data; measured in units of mg/dl), serum albumin (albumin
; mg/dl), and prothrombin time (prothrombin
; seconds). Patients had a mean 6.3 (SD = 3.7) visits (including baseline). The data can be accessed from the joineRML
package via the command data(pbc2)
. Profile plots for each biomarker are shown in Fig. 1, indicating distinct differences in trajectories between the those who died during follow-up and those who did not (right-censored cases). A Kaplan-Meier curve for overall survival is shown in Fig. 2. There were a total of 69 (44.8%) deaths during follow-up in the placebo subset.
lme()
function from the R package nlme
. Albumin did not require transformation. Residuals were grossly non-normal for prothrombin time using both untransformed and log-transformed outcomes. Therefore, a Box-Cox transformation was applied, which suggested an inverse-quartic transform might be suitable, which was confirmed by inspection of a Q-Q plot. The pairwise correlations for baseline measurements between the three transformed markers were 0.19 (prothrombin time vs. albumin), − 0.30 (bilirubin vs. prothrombin time and albumin). The model is fit using the joineRML
R package (version 0.2.0) using the following code.mjoint()
. Additionally, the burn-in phase was increased to 400 iterations after inspection of convergence trace plots. The model fits in 3.1 min on a MacBook Air 1.6GHz Intel Core i5 with 8GB or RAM running R version 3.3.0, having completed 423 MCEM iterations (not including the EM algorithm iterations performed for determining the initial values of the separate multivariate linear mixed sub-model) with a final MC size of M=3528. The fitted model results are shown in Table 4.
joineRML (NR) | joineRML (GN) | Bootstrap | joineR
| ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Estimate | SE | 95% CId | Estimate | SE | 95% CId | SE | 95% CIe | Estimate | SE | 95% CIe | |
β
0,1
| 0.5541 | 0.0858 | (0.3859, 0.7223) | 0.5549 | 0.0846 | (0.3892, 0.7207) | 0.0800 | (0.4264, 0.7435) | 0.5545 | 0.0838 | (0.3802, 0.7031) |
β
1,1
| 0.2009 | 0.0201 | (0.1616, 0.2402) | 0.2008 | 0.0201 | (0.1614, 0.2402) | 0.0204 | (0.1669, 0.2468) | 0.1808 | 0.0209 | (0.1430, 0.2324) |
β
0,2
| 3.5549 | 0.0356 | (3.4850, 3.6248) | 3.5546 | 0.0357 | (3.4846, 3.6245) | 0.0255 | (3.4972, 3.5904) | 3.5437 | 0.0333 | (3.4418, 3.6095) |
β
1,2
| −0.1245 | 0.0101 | (−0.1444, −0.1047) | −0.1246 | 0.0101 | (−0.1444, −0.1047) | 0.0120 | (−0.1489, −0.1063) | −0.0997 | 0.0113 | (−0.1256, −0.0773) |
β
0,3
| 0.8304 | 0.0212 | (0.7888, 0.8719) | 0.8301 | 0.0210 | (0.7888, 0.8713) | 0.0196 | (0.7953, 0.8638) | 0.8233 | 0.0220 | (0.7818, 0.8677) |
β
1,3
| −0.0577 | 0.0062 | (−0.0699, −0.0456) | −0.0577 | 0.0062 | (−0.0698, −0.0455) | 0.0057 | (−0.0698, −0.0486) | −0.0447 | 0.0052 | (−0.0555, −0.0362) |
γ
v
| 0.0462 | 0.0151 | (0.0166, 0.0759) | 0.0462 | 0.0152 | (0.0165, 0.0759) | 0.0173 | (0.0198, 0.0880) | 0.0575a | 0.0123a | (0.0314, 0.0760)a |
0.0413b | 0.0150b | (0.0113, 0.0714)b | |||||||||
0.0424c | 0.0157c | (0.0146, 0.0724)c | |||||||||
γ
bil
| 0.8181 | 0.2046 | (0.4171, 1.2191) | 0.8187 | 0.2036 | (0.4197, 1.2177) | 0.2153 | (0.5172, 1.4021) | 1.2182 | 0.1654 | (0.9800, 1.5331) |
γ
alb
| −1.7060 | 0.6181 | (−2.9173, −0.4946) | −1.6973 | 0.6163 | (−2.9053, −0.4893) | 0.7562 | (−3.3862, −0.5188) | −3.0770 | 0.6052 | (−4.7133, −2.1987) |
γ
pro
| −2.2085 | 1.6070 | (−5.3582, 0.9412) | −2.2148 | 1.6133 | (−5.3768, 0.9472) | 1.6094 | (−5.3050, 0.6723) | −7.2078 | 1.2640 | (−10.5247, −5.2616) |
joineR
(version 1.2.0) owing to its optimization for such models. The LMM parameter estimates were similar, although the absolute magnitude of the slopes was smaller for the separate univariate models. Since 3 separate models were fitted, 3 estimates of γ
v
were estimated, with the average comparable to the multivariate model estimate. The multivariate model estimates of γ
y
=(γbil,γalb,γpro)⊤ were substantially attenuated relative to the separate model estimates, although the directions remained consistent. It is also interesting to note that γpro was statistically significant in the univariate model. However, the univariate models are not accounting for the correlation between different outcomes, whereas the multivariate joint model does.Discussion
joineRML
that can fit the models described in this paper. This was demonstrated on a real-world dataset. Although in the fitted model we assumed linear trajectories for the biomarkers, splines could be straightforwardly employed, as have been used in other multivariate joint model applications [15], albeit at the cost of additional computational time. Despite a growing availability of software for univariate joint models, Hickey et al. [19] noted that there were very few options for fitting joint models involving multivariate longitudinal data. To the best of our knowledge, options are limited to the R packages JMbayes
[49], rstanarm
[50], and the Stata package stjm
[47]. Moreover, none of these incorporates an unspecified baseline hazard. The first two packages use Markov chain Monte Carlo (MCMC) methods to fit the joint models. Bayesian models are potentially very useful for fitting joint models, and in particular for dynamic prediction; however, MCMC is also computationally demanding, especially in the case of multivariate models. Several other publications have made BUGS
code available for use with WinBUGS and OpenBUGS (e.g. [51]), but these are not easily modifiable and post-fit computations are cumbersome.joineRML
is a new software package developed to fill a void in the joint modelling field, but is still in its infancy relative to highly developed univariate joint model packages such as the R package JM
[28] and Stata package stjm
[47]. Future developments of joineRML
intend to cover several deficiencies. First, joineRML
currently only permits an association structure of the form \(W_{2i}(t) = \sum _{k=1}^{K} \gamma _{yk} W_{1i}^{(k)}(t)\). As has been demonstrated by others, the association might take different forms, including random-slopes and cumulative effects or some combination of multiple structures, and these may also be different for separate longitudinal outcomes [18]. Moreover, it is conceivable that separate longitudinal outcomes may interact in the hazard sub-model. Second, the use of MC integration provides a scalable solution to the issue of increasing dimensionality in the random effects. However, for simpler cases, e.g. bivariate models with random-intercepts and random-slopes (total of 4 random effects), Gaussian quadrature might be computationally superior; this trade-off requires further investigation. Third, joineRML
can currently only model a single event time. However, there is a growing interest in competing risks [9] and recurrent events data [11], which if incorporated into joineRML
, would provide a flexible all-round multivariate joint modelling platform. Competing risks [28, 29] and recurrent events [38] have been incorporated into joint modelling R packages already, but are limited to the case of a solitary longitudinal outcome. Of note, the PBC trial dataset analysed in this study includes times to the competing risk of liver transplantation. Fourth, with ever-increasing volumes of data collected during routine clinical visits, the need for software to fit joint models with very many longitudinal outcomes is foreseeable [52]. This would likely require the use of approximate methods for the numerical integration or data reduction methods. Fifth, additional residual diagnostics are necessary for assessing possible violations of model assumptions. The joineRML
package has a resid()
function for extracting the longitudinal sub-model residuals; however, these are complex for diagnostic purposes due to the informative dropout, hence the development of multiple-imputation based residuals [53].Conclusions
joineRML
that can fit the models described in this paper, which leverages the MCEM algorithm and which should scale well for increasing number of longitudinal outcomes. This software is timely, as it has previously been highlighted that there is a paucity of software available to fit such models [19]. The software is being regularly updated and improved.Availability and requirements
joineRML
Project home page:https://github.com/graemeleehickey/joineRML/Operating system(s): platform independent Programming language: R Other requirements: none License: GNU GPL-3 Any restrictions to use by non-academics: noneAcknowledgements
Funding
Availability of data and materials
joineRML
can be installed directly using install.packages(~joineRML~)
in an R console. The source code is available at https://github.com/graemeleehickey/joineRML. Archived versions are available from the Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/web/packages/joineRML/. joineRML
is platform independent, requiring R version ≥ 3.3.0, and is published under a GNU GPL-3 license. The dataset analysed during the current study is bundled with the R package joineRML
, and can be accessed by running the command data(pbc2, package = ~joineRML~)
.