INTRODUCTION

Non-linear mixed effects models have been shown to be an effective tool for the analysis of clinical trial data. As a result, pharmacometric analysis based on non-linear mixed effects models, also known as the population approach, has become an essential step in drug development. The pharmacometric analysis is usually based on the assumption that the parameter and uncertainty estimates of the non-linear model are estimated correctly by a numerical method; however, these estimates will unavoidably be influenced by the numerical stability of the computational algorithm when the numerical method is based on finite precision computations.

In this paper, we propose a preconditioning method for non-linear mixed effects models to increase the computational stability of the variance-covariance matrix. Preconditioning is a widely used technique to increase the computational stability for numerically solving large sparse systems of linear equations (1,2). Inspired by this technique, we reparameterise the model with a linear combination of the model parameters so that the Hessian of the −2ln(likelihood) (R-matrix) of the reparameterised model becomes close to an identity matrix. This approach should reduce the influence of computational issues and reduce the chance of the R-matrix being non-positive definite and also give an indication of when the R-matrix is fundamentally singular.

To test this preconditioning method, we have conducted numerical experiments using published non-linear mixed effects models (and data) from applications in pharmacometrics. Our hypothesis is that preconditioning will reduce the failed variance-covariance matrix computations when the R-matrix is fundamentally positive definite, and correctly indicate a fundamentally singular R-matrix in case of the existence of non-identifiable model parameters. In addition, an automated preconditioning routine is made available as a part of the software package Perl-speaks-NONMEM (PsN) version 4.4 and above (3).

BACKGROUND

Computational instability related to the R-matrix can influence pharmacometric analyses in two ways. First, given that the positive definiteness of the R-matrix is a necessary condition for the estimated parameters to be at the maximum likelihood (cf. Appendix A.3), if we are at the maximum likelihood, an R-matrix may appear to be non-positive definite. For example when using a parameter estimation software for a non-linear mixed effects model such as NONMEM, a user may encounter a message like “R-matrix Algorithmically Singular and Non-Positive Semi Definite” and the computation of the variance-covariance matrix will fail (when the model is, in fact, identifiable and the parameters are at the maximum likelihood). In this case, preconditioning will help reduce the misjudgement of the nature of the R-matrix by making the R-matrix of the reparameterised model as close to an identity matrix as possible. Second, if the R-matrix is in theory singular (e.g. if a model parameter is not estimable from the data), however, due to the computational error in handling a singular matrix, this matrix can appear to be positive definite. In this case, computational instability can lead to the misleading conclusion that the model parameter is estimable, as standard errors may be calculated using the R-matrix. In this situation, preconditioning can help by demonstrating the singularity of the matrix.

Variance-Covariance Matrix

The variance-covariance matrix (M) of a non-linear mixed effects model is used to asymptotically quantify the parameter estimation uncertainty and correlations. The square root of the ith diagonal element of M (i.e. \( \sqrt{m_{ii}} \)) is an estimate of the standard error (SE) of the ith parameter, while the relative standard error (RSE) can be computed as \( \sqrt{m_{ii}}/\left|{\theta}_i\right| \) where θ i is the value of the ith parameter. Correlations between the ith and jth parameters are computed as \( {m}_{ji}/\sqrt{m_{ii}*{m}_{jj}} \). M can be estimated as follows (4,5):

$$ M:={R}^{-1}S{R}^{-1} $$
(1)

where matrix R is the Hessian of the −2ln(likelihood) with respect to the model parameters and matrix S denotes the sum of the cross products of the gradient vectors of the −2ln(individual − likelihood). Hence, to obtain the variance-covariance matrix M, we need to have the inverse of R. In addition, if R is not positive semi-definite, then the estimated parameters are not at a minimum of the −2ln(likelihood) (see Appendix A.3 for a detailed discussion).

Preconditioning aims to avoid the R-matrix being computationally non-positive definite by estimating the parameter s and calculating the R-matrix in a linearly reparameterised parameter space. It uses the R-matrix from a previous parameter estimation and reparameterises the model such that the R-matrix of the preconditioned model is close to an identity matrix. Note that an R-matrix should be, generally, easily obtainable after parameter estimation; it is the inverse of R that may be difficult if it is close to singular. If the R-matrix of the preconditioned model is close to an identity matrix, the chance of the matrix appearing to be non-positive semi-definite due to rounding error will be reduced. In addition, linear transformation of a singular matrix cannot be non-singular; hence, if the R-matrix is fundamentally singular, the R-matrix of the preconditioned model will remain singular, giving a stronger indication that a model parameter is non-identifiable.

METHOD

In this section, we describe the technical details of how we construct the preconditioned model and justify why this method of preconditioning should increase the computational stability.

Preconditioning the Model

The goal of preconditioning is to linearly reparameterise the model so that the R-matrix of the preconditioned model is as close to an identity matrix as possible. In other words, preconditioning will scale and rotate the parameter space so that the curvature of the likelihood surface becomes the same in all directions.

Let f be a map from the parameter vector θ to −2ln(likelihood). A number of approximation methods for the likelihood for a non-linear mixed effects model can be found in (6), which can be used to obtain this map. We assume f to be a twice continuously differentiable real function. The Hessian matrix R of f(θ) is defined as follows:

$$ {r}_{ij}=\frac{\partial^2f\left(\boldsymbol{\theta} \right)}{\partial {\theta}_i{\theta}_j}, $$
(2)

where r ij is an element in matrix R at the ith row and the jth column. Due to an assumption on the smoothness of f, matrix R is a symmetric matrix.

We write the linear reparameterisation of the model parameter vector θ to be

$$ \boldsymbol{\theta} =P\kern.2em \widehat{\boldsymbol{\theta}}, $$
(3)

where P is a preconditioning matrix and \( \widehat{\boldsymbol{\theta}} \) is a new parameter vector in a preconditioned model. We choose P to be an invertible matrix so that the estimated parameters and variance-covariance matrix of the preconditioned model can be transformed back to the original parameterisation of the model. We can then write each element of the Hessian matrix of the preconditioned model, \( \widehat{R} \), as follows:

$$ {\widehat{r}}_{ij}=\frac{\partial^2f\left(P\kern.2em \widehat{\boldsymbol{\theta}}\right)}{\partial {\widehat{\theta}}_i{\widehat{\theta}}_j}. $$
(4)

Our goal is to find a preconditioning matrix P so that \( \widehat{R} \) becomes close to an identity matrix. To do this, we first rewrite \( \widehat{R} \) in terms of P and R.

$$ {\widehat{r}}_{ij}=\frac{\partial^2f}{\partial {\widehat{\theta}}_i{\widehat{\theta}}_j}, $$
(5)
$$ ={\displaystyle {\sum}_k\frac{\partial }{\partial {\theta}_k}\frac{\partial f}{\partial {\widehat{\theta}}_j}\frac{d{\theta}_k}{d{\widehat{\theta}}_i}}, $$
(6)
$$ ={\displaystyle {\sum}_k\frac{\partial }{\partial {\theta}_k}}\left({\displaystyle {\sum}_l\frac{\partial f}{\partial {\theta}_l}\frac{\partial {\theta}_l}{d{\widehat{\theta}}_j}}\right)\frac{d{\theta}_k}{d{\widehat{\theta}}_i}, $$
(7)
$$ ={\sum}_k\frac{\partial }{\partial {\theta}_k}\left(\nabla f\cdot \boldsymbol{p}{.}_{,j}\right)\frac{d{\theta}_k}{d{\widehat{\theta}}_i}, $$
(8)
$$ =\left({\sum}_k{p}_{k,i}\left[\kern1em \begin{array}{c}\frac{\partial^2}{\partial {\theta}_k{\theta}_1}\kern1em \\ {}\kern1em \frac{\partial^2}{\partial {\theta}_k{\theta}_2}\kern1em \\ {}\kern1em \vdots \kern1em \\ {}\kern1em \frac{\partial^2}{\partial {\theta}_k{\theta}_n}\end{array}\kern1em \right]f\right)\cdot \boldsymbol{p}{.}_{,j}, $$
(9)
$$ =\left(R{\boldsymbol{p}}_{\cdot, i}\right)\cdot {\boldsymbol{p}}_{\cdot, j}, $$
(10)
$$ ={\left(R{\boldsymbol{p}}_{\cdot, i}\right)}^{\mathrm{T}}{\boldsymbol{p}}_{\cdot, j}, $$
(11)
$$ ={\boldsymbol{p}}_{\cdot, i}^{\mathrm{T}}R\kern.2em {\boldsymbol{p}}_{\cdot, j}\kern.1em ; $$
(12)

hence, we have

$$ \widehat{R}={P}^{\mathrm{T}}RP\ . $$
(13)

Given that R is a symmetric matrix, we can obtain an eigendecomposition of R so that

$$ R=V\Lambda {V}^{\mathrm{T}}, $$
(14)

where V is a matrix constructed by the normalised eigenvectors, i.e. [v 1, v 2, ⋯, v n ], and Λ is a diagonal matrix with the corresponding eigenvalues, i.e. diag (λ 1, λ 2, ⋯ λ n ). Note that V is a unitary matrix, that is to say V T V = I. We now choose the preconditioning matrix as follows:

$$ P=V{\Lambda}^{-1/2}, $$
(15)

where Λ− 1/2 is a diagonal matrix with the reciprocal of the square root of the absolute value of the eigenvalues, i.e. diag \( \left(1/\sqrt{\left|{\lambda}_1\right|},1/\sqrt{\left|{\lambda}_2\right|},\cdots, 1/\sqrt{\left|{\lambda}_n\right|}\right) \). Substituting Eq. (15) into Eq. (13) reveals that

$$ \widehat{R}={\left(V{\Lambda}^{-\frac{1}{2}}\right)}^{\mathrm{T}}RV{\Lambda}^{-1/2}, $$
(16)
$$ ={\Lambda}^{-1/2}{V}^{\mathrm{T}}RV{\Lambda}^{-1/2}, $$
(17)
$$ ={\Lambda}^{-\frac{1}{2}}{V}^{\mathrm{T}}\left(V\Lambda {V}^{\mathrm{T}}\right)V{\Lambda}^{-1/2}\kern0.5em \left(\mathrm{b}\mathrm{y}\ \mathrm{the}\ \mathrm{eigendecomposition}\ \mathrm{of}\;R\right) $$
(18)
$$ ={\Lambda}^{-1/2}{\Lambda \Lambda}^{-1/2}\;\left(\mathrm{since}\kern0.62em V\;\mathrm{is}\ \mathrm{a}\ \mathrm{unitary}\ \mathrm{matrix}\right) $$
(19)
$$ =I\ . $$
(20)

In other words, if the model is preconditioned by the Hessian R of the model using the preconditioning matrix defined in Eq. (15), then the Hessian \( \widehat{R} \) of the preconditioned model will be an identity matrix.

Transformation of the Estimated Parameters and Variance-Covariance Matrix of the Preconditioned Model Back to the Original Model Parameterisation

Once maximum likelihood estimation of the preconditioned model has been performed, one can transform the parameters to the original parameterisation using Eq. (3). Similarly, the variance-covariance matrix M of the original model parameterisation can be calculated from the variance-covariance matrix of the preconditioned model \( \widehat{M} \) as follows:

$$ M=P\widehat{M}{P}^{\mathrm{T}}. $$
(21)

Effect of the Inaccuracy of the R-matrix Used for Preconditioning

In reality, we do not have the exact R-matrix of the model (e.g. by the rounding error, the Hessian not being evaluated at the exact maximum of the likelihood, or numerical error of the model evaluation) so we now consider preconditioning using a perturbed Hessian (inaccurate R-matrix). We denote the perturbed Hessian as \( R\sim \), and assuming it is symmetric, we can rewrite it with eigendecomposition as follows

$$ R\sim =\left(V+\Delta V\right)\left(\Lambda +\Delta \Lambda \right){\left(V+\Delta V\right)}^{\mathrm{T}}, $$
(22)

where (Λ + ΔΛ) is a diagonal matrix with the eigenvalues of \( R\sim \), (V + ΔV) is a matrix containing the eigenvectors of \( R\sim \) and V and Λ are as defined in Eq. (14). We now have the preconditioning matrix as follows:

$$ P\sim =\left(V+\Delta V\right)\left|\Lambda +\Delta \Lambda \right|{}^{-1/2}. $$
(23)

The difference between the identity matrix and the R-matrix of the preconditioned model with the preconditioning matrix \( P\sim \) can be bounded as follows:

$$ \left|\right|I-{\tilde{P}}^{\mathrm{T}}R\tilde{P}{\left|\right|}_{\mathrm{F}}\le \left({\displaystyle {\sum}_{i=1}^n\frac{1}{\left|{\lambda}_i+\varDelta {\lambda}_i\right|}}\right)\left(\sqrt{{\displaystyle {\sum}_{i=1}^n\varDelta {\lambda}_i^2}}+2\left|\right|\Delta V\left|\right|{}_{\mathrm{F}}\sqrt{{\displaystyle {\sum}_{i=1}^n{\lambda}_i^2}}\right). $$
(24)

Roughly speaking, this shows that if we use the matrix \( R\sim \) that is close to the true Hessian R of the original model (i.e. ΔV and ΔΛ are small), then the Hessian of the preconditioned model will be close to the identity matrix if λ i  ≠ 0. Also, we can observe that choosing matrix \( \tilde{R} \) with eigenvalues of similar order of magnitude as the true Hessian of the original model is desirable to make the Hessian of the preconditioned model close to an identity matrix.

Iterated Preconditioning

If the R-matrix from the original model can be significantly different from the true R-matrix, preconditioning just once may not resolve the computational instability. We can construct the preconditioning matrix based on a previously run preconditioned model by approximating the R-matrix of the original model as follows:

$$ R={\left({P}^{\mathrm{T}}\right)}^{-1}\overline{R}{(P)}^{-1}, $$
(25)

where R is the estimation of the Hessian of the original model, P is the preconditioning matrix used to construct the preconditioned model and \( \overline{R} \) is the Hessian of the preconditioned model. We can use this estimated R-matrix of the original model, construct the preconditioning matrix as in Eq. (15) and iteratively precondition.

SOFTWARE IMPLEMENTATION IN PsN

The preconditioning technique proposed in this paper is implemented in the software PsN (versions ≥4.4.0) (7). For detailed instructions on the use of this Perl script, we refer the users to the documentation enclosed in the PsN distribution. As PsN is a collection of Perl scripts to augment the functionality of NONMEM (8), in this section, we assume basic knowledge on how the parameters of a non-linear mixed effects model can be estimated using NONMEM.

Overall Workflow

This section describes the overall workflow when the following PsN command is used:

$$ \mathrm{precond}<\mathrm{model}\_\mathrm{name}>. \mod $$
  1. 1.

    Read in the model file < model_name > .mod (we refer to it as the “original model”).

  2. 2.

    Estimate the model parameters and variance-covariance matrix of the original model using NONMEM and obtain the R-matrix (note again that the R-matrix is relatively problem free to calculate; the inverse is the step that may give problems). The estimated parameters and their standard errors (if the variance-covariance matrix computation is successful) are saved as a text file. If the variance-covariance matrix computation of the original model is successful, then the script terminates and will not proceed to the following steps.

  3. 3.

    Create preconditioning matrix P as defined in Eq. (15).

  4. 4.

    Fixed effect parameters are reparameterised as

\( \boldsymbol{\theta} =P\kern-.2em \widehat{\boldsymbol{\theta}} \), where θ is the vector of fixed effect parameters of the original model and \( \widehat{\boldsymbol{\theta}} \) is the vector of the fixed effect parameters of the preconditioned model. For example, considering an original model with two fixed effect parameters, this reparameterisation can be found in the preconditioned model as follows:

$PK

IF (NEWIND == 0) THEN

THE_1 = p 11 * THETA(1) + p 12 * THETA(2)

THE_2 = p 21 * THETA(1) + p 22 * THETA(2)

END IF

In this case, the vectors θ and \( \widehat{\boldsymbol{\theta}} \) are

θ =[THE_1, THE_2]T

\( \widehat{\boldsymbol{\theta}} \) =[THETA(1), THETA(2)]T

For models with many THETAs, it is necessary to increase the bounds of the number of variables and constants used by NONMEM. These are set using $SIZES for DIMTMP, DIMCNS and DIMNEW. To save computational time, THE_x are only calculated as often as the THETAs are updated (using the “IF (NEWIND == 0) THEN … END IF” structure).

  1. 5.

    THETA(x) in the original model file are replaced with THE_x in all relevant code blocks (currently pk, pred, error, des, aes, aesinitial and infn). For example, CL = THETA(1) in the original model will be replaced with CL = THE_1 in the preconditioned model.

  1. 6.

    All bounds for thetas in the preconditioned model are removed and the initial estimate of the parameters are updated by \( {\widehat{\boldsymbol{\theta}}}_{init} \) where

    $$ {\widehat{\boldsymbol{\theta}}}_{init}={P}^{-1}{\boldsymbol{\theta}}_{init}. $$

    The preconditioned model can be found in m1/<modelname > _repara.mod in the precond run directory.

  1. 7.

    Estimate the parameters \( \widehat{\boldsymbol{\theta}} \) and variance-covariance matrix \( \widehat{M} \) of the preconditioned model using NONMEM.

  2. 8.

    If \( \widehat{\boldsymbol{\theta}} \) was obtained, then the estimated parameter vector of the original model θ is calculated using Eq. (3) and saved as a text file.

  3. 9.

    If \( \tilde{M} \) was obtained, then the variance-covariance matrix of the original model M is calculated using Eq. (21) and saved as a text file. Standard errors of θ are computed from M and saved to the file created in the previous step.

Options

Here, we describe two options to the precond tool in PsN that are used for the numerical experiments presented in “NUMERICAL EXPERIMENTS”. We refer the readers to the documentation of PsN for the complete list of options.

always: If we specify the -always option, the script will proceed to step 3 of the workflow regardless of the success of the variance-covariance step. This option can be used to verify that the estimated parameters are consistent between the original model and the ones obtained through preconditioning. If the estimated parameters are not in agreement while the likelihoods are identical, most likely, the parameters are not identifiable from the data (see “Experiment 2: Reveal Non-identifiability” for an example). Also, if the computational results of the variance-covariance matrix will be used for further analysis, we strongly recommend using this option as the variance-covariance matrix obtained through preconditioning is more accurate (less influenced by the rounding error and less dependent on the computational environment) than the original model.

pre=precond_dir1: With this option we can conduct multiple, iterative, preconditioning steps as presented in “Iterated Preconditioning”. Step 2 of the workflow will be skipped and in step 3, the preconditioning matrix is created as in Eq. (25) with the R-matrix obtained from the previous preconditioning step. Multiple preconditioning can be repeated indefinitely (i.e. by using the options -pre=precond_dir1 -pre=precond_dir2 -pre=precond_dir3 …) as long as the R-matrix from the previous preconditioned model was obtained.

Limitations

The current implementation of preconditioning in PsN only preconditions the fixed effects portion of the model. If the user wishes to precondition both the fixed effects and random effects portions of the model, then a reparameterisation of the model can be done as presented in Appendix C.1.1. Also, the current implementation neglects constraints on the parameter search space (i.e. boundaries of the fixed effect parameters set in the $THETA record). If the sign of the parameter is crucial in order for it to be estimated appropriately, negativity or positivity of the parameter can be imposed through simple reparameterisation as presented in Appendix C.1.2. The current implementation cannot precondition mixture models (i.e. cannot reparameterise a model with a $MIX record).

NUMERICAL EXPERIMENTS

In order to illustrate that preconditioning can be a useful tool for pharmacometric analyses, we have conducted numerical experiments using NONMEM version 7.3 and PsN version 4.4.0 (version 4.5.6 for SIR analysis) on a Linux Cluster (2.4GHz Intel Xeon E5645, Scientific Linux release 6.5, GCC 4.4.7, Perl 5.10.1). We have used three published models as described in Appendices C.2.1–C.2.4 and one realistic simulation study described in Appendix C.2.5.

Experiment 1: Recover Failed Variance-Covariance Matrix Computation

Computational instability in the computation of the variance-covariance matrix can make the R-matrix appear to be non-positive semi-definite.

In theory, if the R-matrix is not positive semi-definite and the gradient of the likelihood is zero, then the estimated parameter is located at a saddle point of the likelihood surface; hence, it is not at the maximum of the likelihood. In practice, instability in the computation of the variance-covariance matrix can make the R-matrix appear to be non-positive semi-definite. In this case, NONMEM will terminate the variance-covariance matrix computation. Preconditioning can rectify this situation and allow the variance-covariance matrix to be computed.

This computational instability of the R-matrix can be caused by computational inaccuracy of the parameter estimation, the model evaluation (especially when the model evaluation includes numerical approximation, e.g. ODEs and matrix exponential) and finite difference step scheme for computing the second derivative.

Through this numerical experiment, we show that preconditioning can improve the accuracy of the computation so that an R-matrix evaluated at the (local) maximum of the likelihood will be positive semi-definite (i.e. if the R-matrix is theoretically positive semi-definite); hence, the variance-covariance matrix can be obtained.

For models 1, 2 and 3, we have first estimated the model parameters from data and then, using the estimated parameters and the model, simulated 100 data sets with the same data structure as the original data and reestimated the parameters and variance-covariance matrix for each simulated data set. This process is automated using the SSE script of PsN. For model 4, as it is a time-to-event model and to avoid creation of a simulation data set, we have used a case-sampling bootstrap method to create 100 different data sets and reestimated the parameters and variance-covariance matrix for each bootstrapped data set. For model 4, we used the bootstrap script of PsN to conduct this numerical experiment. To maintain the reproducibility of the results, we have fixed the random seed. We observe that the computation of the variance-covariance matrix for these simulations and reestimations failed 36, 31, 20 and 13% of the time for models 1, 2, 3 and 4, respectively.

For the data sets for which the computation of the variance-covariance matrix was not successful, we have iteratively repeated preconditioning until the computation was successful. As can be seen in Table I, most of the variance-covariance matrices of the models can be computed when the computation was stabilised by preconditioning (a variance-covariance matrix computation failure rate of 1, 1, 6 and 0% for models 1, 2, 3 and 4, respectively).

Table I Number of Successful Variance-Covariance Matrix Computations

For model 1, we have repeated the same numerical experiment using the Laplace approximation of the likelihood. We observe that the computation of the variance-covariance matrix failed 72% of the time and that after preconditioning the variance-covariance matrix calculation failure rate decreased to 40%.

Experiment 2: Reveal Non-identifiability

Through this numerical experiment, we show that numerical instability can potentially hide non-identifiability of the model parameters and preconditioning can help discover the non-identifiability of the model parameters. We show this using model 5 (see Appendix C.2.5 for the details of the model) and first conduct a standard analysis on the parameter estimation uncertainty quantification and then conduct an analysis using preconditioning. The standard analysis suggests that the parameters can be estimated accurately; however, the preconditioning contradicts this by demonstrating the existence of two different sets of model parameters with the same maximised likelihood.

Standard Analysis

As proven in Appendix C.2.5, model 5 is not structurally identifiable; hence, the Hessian matrix R is, in theory, a singular matrix (i.e. the inverse of this matrix does not exist). Thus, the variance-covariance matrix as defined in Eq. (1) does not exist. Despite this, the R-matrix is incorrectly calculated to be non-singular and the variance-covariance matrix was obtained using NONMEM in our computational environment. The minimisation was determined to be “successful” according to NONMEM. As an alternative to the variance-covariance matrix, a case-resampling bootstrap method or a sampling importance resampling (SIR) method (9,10) could be used to quantify the parameter estimation uncertainty. We have conducted both methods on the original model using PsN, obtained the parameter estimation uncertainty and tabulated the standard errors in Table II. All of the variance-covariance matrix, the bootstrap method and SIR method suggest reasonable relative standard errors, so one would conclude that the parameters can be estimated accurately from this data set (despite the model being not structurally identifiable). Note that we have used the (in this case incorrectly computed) variance-covariance matrix to create the proposal distribution (default setting of SIR implemented in PsN 4.5.6). The bootstrap method relies on the parameter estimation where the parameters are locally searched near the user-specified initial estimate to fit the bootstrap data. Thus, as can be seen in Table II, if the bootstrap method is run on the non-estimable model, the estimated bootstrap parameters remain close to the initial estimate.

Table II Estimated Parameters and Relative Standard Error (RSE) of Model 5

Analysis Using Preconditioning

After preconditioning, the variance-covariance matrix was still obtainable; however, as can be seen in Table III, the large SEs for the parameters that are not identifiable which clearly indicate these parameters cannot be estimated. The eigenvalues of the R-matrix of the original model and after preconditioning are tabulated in Table IV. An almost zero eigenvalue and an increase in the condition number after preconditioning suggest the R-matrix to be fundamentally singular; hence, the parameters are not locally practically identifiable (non-estimable) from the data. In fact, as can be seen in Table III, the estimated parameters from the original model and through the preconditioned model are significantly different despite almost identical likelihoods. Thus, the parameters of this model with the given data cannot be estimated uniquely in the maximum likelihood sense. In order to cross-check this result, we have used the parameters obtained through the preconditioned model that appears in Table III and used them as the initial estimate in the original model and reestimated the parameters. The reestimated parameters were identical to what appears in Table III even with the original model after moving the initial estimates. Hence, the parameters we have found through preconditioning were not an artefact of preconditioning but due to the fact that the model parameters were not estimable from the given data. In addition, this result is consistent with our analytic proof in Appendix C.2.5 showing that only the ratios between the parameters V1, Q, V2 and CL are identifiable.

Table III Estimated Parameters and Relative Standard Errors (RSEs) After Preconditioning of Model 5
Table IV Eigenvalues of the Hessian of the −2ln(likelihood) with Respect to Fixed Effect Parameters of Model 5 With or Without Preconditioning

Additionally, we have conducted SIR analysis using the variance-covariance matrix obtained from preconditioning to construct the proposal distribution. As can be seen in Table III, SIR clearly shows some of the model parameters to be not estimable if the proposal distribution was properly constructed.

Simulation Study

In order to further validate that preconditioning is likely to reveal non-identifiability of the model, we have conducted simulation studies using this model. We have simulated 100 data sets from the model and reestimated the parameters and variance-covariance matrix using the SSE tool in PsN. For 90% of the data sets, the variance-covariance matrix was obtainable, and in 47% of the cases, the maximum RSE was less than 100%. We have preconditioned the model using the -always option, and after preconditioning, 62% of the time, the covariance calculation was correctly terminated with the correct detection of the singular R-matrix. For the other 38% of the cases where the variance-covariance matrices were obtainable, all the RSEs of structurally non-identifiable parameters calculated from these matrices were above 100%, clearly indicating the estimability issues.

DISCUSSION

We have introduced a preconditioning technique for non-linear mixed effects models to reduce the computational instability that can potentially influence the results of pharmacometric analyses. Through numerical experiments, we have shown that preconditioning can recover failed variance-covariance matrix computations and reveal non-identifiability of the parameters.

Preconditioning can be thought of as an extension of a normalisation of model parameters. Alternatively, normalisation can be thought of as “preconditioning” with a diagonal matrix. As normalisation of model parameters is implemented in most of the parameter estimation algorithms including the ones implemented in NONMEM, we believe the utilisation of the ideas presented in this paper can be easily translated to many parameter estimation and variance-covariance matrix computation algorithms for non-linear mixed effects models.

Transformation of model parameters, including normalisation, is usually done to allow the order of magnitudes of the parameters to be similar during parameter estimation computations. The essential differences of preconditioning compared to the usual parameter transformation is that preconditioning scales the parameters to make the second derivatives of −2 log likelihood with respect to the parameters (i.e. diagonal elements of the R-matrix) to be as close to each other as possible. At the same time, preconditioning aims to reduce the parameter-parameter correlations (i.e. making the off-diagonal elements of the R-matrix as close to zero as possible).

As shown in experiment 1, preconditioning can be used for various types of models (e.g. for a model given as analytic functions, a system of ODEs where the solution is approximated through numerical integration or using matrix exponential) and various non-stochastic estimation methods (e.g. FOCE and Laplace).

As shown in experiment 2, the computation issues for variance-covariance matrix calculations can produce potentially misleading results. As such, we recommend the use of preconditioning if the variance-covariance matrix is to be used for a proceeding analysis (e.g. calculating and reporting standard errors or constructing the proposal density for SIR analysis). The use of repeated preconditioning until one obtains two variance-covariance matrices with similar standard errors should further guarantee the accuracy of the variance-covariance matrix. Also, it should be reminded to the readers that, as demonstrated in experiment 2, if the parameters are not estimable from the data (e.g. the model is over-parameterised), the R-matrix is, in theory, singular and, hence, the variance-covariance matrix does not exist. Thus, the correct behaviour of the computational algorithm is to indicate that the matrix is not obtainable. In addition, if the estimated parameter is at the saddle point, the R-matrix is, in theory, non-positive definite and, hence, the correct behaviour of the computational algorithm is to indicate that the matrix is also not obtainable.

There are various more sophisticated methods that can be used to quantify parameter estimation uncertainties. As more advanced methods usually require considerably more computation than a single variance-covariance matrix computation, we would recommend obtaining the accurately computed variance-covariance matrix first and then proceeding to a further analysis of the uncertainty. Especially as demonstrated in experiment 2, the variance-covariance matrix and the R-matrix can give a strong indication on the non-estimability of the model parameters if computed correctly, while other methods may not be able to do so.

During our investigation, we have observed that although preconditioning remedies most of the computation issues related to R-matrix, there are many other sources of computation instabilities, for example S-matrix computation and parameter estimation. Thus, we believe further investigation into computational stability issues on the numerical algorithms used in pharmacometrics could be beneficial to the field.