Motivating example
An artificial hip includes three major components: a stem that is inserted into the femur, a head (a ball) attached to the top of the femur and a cup, also called the acetabular component, that is implanted into the pelvis. A hip resurfacing procedure is typically used in younger patients where it can delay the need for a total hip replacement, it replaces the socket with an artificial cup and resurfaces the head of the femur instead of removing it. In 2010, NJR recorded 123 brands of acetabular cups, 13 brands of resurfacing cups and 146 brands of femoral stems used in primary and revision procedures [
14].
Given a vast variety of available types and brands of prosthesis components for use in the hip replacement surgery, monitoring implant quality is the main objective of the NJR implant scrutiny group that was established in 2009. According to the current NJR methodology [
15], an implant is considered to be a Level 1 outlier when its Patient Time Incident Rate (PTIR) is twice the PTIR of the implant group, where the group rate is weighted by the relevant implant types. From 2009 to 2014, three hip stems, three hip acetabular components and 17 hip stem/cup combinations were reported as Level 1 outliers [
13].
To test our analytical approach on real world data, our analysis will focus on two of these outlier compoents: (i) the DePuy ASR Resurfacing Cup (first identified as a part of an outlier head/cup combination in April 2010 and last implanted in July 2010) and (ii) the Biomet M2A-38 acetabular cup (first identified by the NJR as an outlier in 2014, and last implanted in June 2011).
A standard CUSUM chart usually has a learning period where the parameters of the relevant null distribution are estimated, and the deviation from the null of clinical concern is decided upon to calibrate the control limits. The chart is then run with these control limits. An example of this approach is by Hardoon et al. [
7], 2007 who monitored a constant target revision rate in a time interval. However, the failure rates differ by implant types, the age of the patients, and other case mix characteristics. They also may vary by the site at which operations take place (the operating unit). Therefore we consider a risk-adjusted CUSUM where the target rates are estimated for the popular implants (top 80%), and experienced units (more than 1 surgery per week, on average), which requires an introduction of shared frailty terms, describing similarities within and heterogeneity between units, to our survival models, and an appropriate adjustment of the control limits.
Description of the NJR data
The NJR data were made available after a formal request to the NJR Research Committee. The dataset is related to the data cut used in the 10th NJR Annual Report [
16]. The data were anonymised in respect to patient, to surgeon and to operating unit identifying details. Approval was obtained from Computing Subcommittee of the University of East Anglia Ethics Committee, reference number CMP/1718/F/10A. The NJR dataset provides the following four groups of variables used in the time-to-failure analysis of the hip replacements to risk-adjust the CUSUM boundaries.
Information on procedures, such as date of operation or revision, and side;
Institution and staff involved, such as unit and consultant IDs (anonymised), and surgeon grade;
Hip prosthesis characteristics, such as fixation type (cemented, uncemented, hybrid, resurfacing), its components (head, cup, stem, and liner brands), head size, bearing surfaces (metal, polyethylene, ceramic);
Patient characteristics, such as age, sex, ASA physical status classification [
17] at 5 levels from healthy (1) to near death (5), Body Mass Index (BMI), index of multiple deprivation (IMD)[
18] (a higher IMD means higher proportion of people in the area classed as deprived), and death date.
Since about a half of records had missing BMI values, this factor was excluded from further consideration. ASA scores were grouped into two categories in further analysis: ASA 1-2 - normal healthy patients and patients with mild systemic disease, ASA 3-5 - patients with serious, non-incapacitating systemic disease, patients with life-threatening incapacitating systemic disease and patients that are near death.
Data selection in SQL (elimination of duplicates, second and subsequent revisions) resulted in 504,024 records with the fields listed above. By further cleaning the following records have been additionally excluded:
Patients with bilateral operations;
Records with missing or misreported side;
Records with time to revision equal to 0;
Records with date of operation after 31 December 2012;
Patients younger than 50 years at operation day;
Records with missing values of IMD.
This process resulted in 281,265 records. Finally, all records for the patients operated in units with less than 52 operations per year (i.e. less than once per week, on average), and all records with implanted cup/head brands in the bottom 20% in popularity that year, as well as cup/head brands “DePuy” and “Biomet” were excluded in the in-control dataset, resulting in 113,772 records in total. To test the efficiency of our CUSUM procedure, we have also selected two test datasets including only the records with cup brands “DePuy ASR Resurfacing Cup” (1734 records) and “Biomet M2A 38” (764 records), respectively. The cases for prostheses revised within three months of implantation were censored at the time of revision to exclude failures that might be directly attributive to surgical technique or postoperative complications. Description of the three datasets is given in Table
1. We provide analysis of these data performed in R [
19] in the “
Results” section.
Table 1Description of the datasets
Sample size | Number | 44,468 | 69,304 | 113,772 | 1093 | 641 | 1734 | 315 | 449 | 764 |
| % by sex | 39.1 | 60.9 | | 63.0 | 37.0 | | 41.2 | 58.8 | |
Revisions | Number | 596 | 740 | 1336 | 132 | 169 | 301 | 15 | 36 | 51 |
| % by sex | 44.6 | 55.4 | | 43.9 | 56.1 | | 29.4 | 70.6 | |
Deaths | Number | 4074 | 5512 | 9586 | 56 | 31 | 87 | 40 | 37 | 77 |
| % by sex | 42.5 | 57.5 | | 64.4 | 35.6 | | 51.9 | 48.1 | |
Censored | Number | 39,798 | 63,052 | 102,850 | 905 | 441 | 1346 | 260 | 376 | 636 |
| % by sex | 38.7 | 61.3 | | 67.2 | 32.8 | | 40.9 | 59.1 | |
Age | Mean | 69.4 | 71.5 | 70.7 | 59.9 | 61.5 | 60.5 | 67.8 | 67.8 | 67.8 |
| StDev | 9.1 | 9.3 | 9.2 | 6.9 | 8.5 | 7.6 | 7.3 | 7.6 | 7.5 |
IMD | Mean | 19 | 19 | 19 | 18.6 | 17.3 | 18.1 | 11.6 | 12.3 | 12 |
| StDev | 9.2 | 9.2 | 9.2 | 10.8 | 10.4 | 10.7 | 5 | 5.2 | 5.2 |
HeadSize | Mean | 32.6 | 30.2 | 31.2 | 49.4 | 45.1 | 47.8 | 38 | 38 | 38 |
| StDev | 6.5 | 3.8 | 5.2 | 3 | 2.6 | 3.5 | 0 | 0 | 0 |
Fixation | | | | | | | | | | |
Cemented | Number | 18,787 | 36,150 | 54,937 | 71 | 35 | 106 | 8 | 7 | 15 |
| % | 42.2 | 52.2 | 48.3 | 6.5 | 5.5 | 6.1 | 2.5 | 1.6 | 2 |
Uncemented | Number | 13,522 | 17,679 | 31,201 | 503 | 403 | 906 | 297 | 434 | 731 |
| % | 30.4 | 25.5 | 27.4 | 46 | 62.9 | 52.2 | 94.3 | 96.7 | 95.7 |
Hybrid | Number | 9029 | 14,260 | 23,289 | 49 | 25 | 74 | 10 | 8 | 18 |
| % | 20.3 | 20.6 | 20.5 | 4.5 | 3.9 | 4.3 | 3.2 | 1.8 | 2.4 |
Resurfacing | Number | 3130 | 1215 | 4345 | 470 | 178 | 648 | 0 | 0 | 0 |
| % | 7 | 1.8 | 3.8 | 43 | 27.8 | 37.4 | 0 | 0 | 0 |
ASA 1-2 | Number | 36,598 | 57,355 | 93,953 | 1012 | 587 | 1599 | 306 | 438 | 744 |
| % | 82.3 | 82.8 | 82.6 | 92.6 | 91.6 | 92.2 | 97.1 | 97.6 | 97.4 |
ASA 3-5 | Number | 7870 | 11,949 | 19,819 | 81 | 54 | 135 | 9 | 11 | 20 |
| % | 17.7 | 17.2 | 17.4 | 7.4 | 8.4 | 7.8 | 2.9 | 2.4 | 2.6 |
Cup/Head bearing surfaces | | | | | | | | | | |
Ceramic/Ceramic | Number | 6584 | 8161 | 14,745 | 0 | 0 | 0 | 0 | 0 | 0 |
| % | 14.8 | 11.8 | 13 | 0 | 0 | 0 | 0 | 0 | 0 |
Metal/Metal | Number | 165 | 129 | 294 | 0 | 0 | 0 | 315 | 449 | 764 |
| % | 0.4 | 0.2 | 0.3 | 0 | 0 | 0 | 100 | 100 | 100 |
Polyethylene/Ceramic | Number | 4863 | 7070 | 11,933 | 0 | 0 | 0 | 0 | 0 | 0 |
| % | 10.9 | 10.2 | 10.5 | 0 | 0 | 0 | 0 | 0 | 0 |
Polyethylene/Metal | Number | 29,088 | 52,436 | 81,524 | 0 | 0 | 0 | 0 | 0 | 0 |
| % | 65.4 | 75.7 | 71.7 | 0 | 0 | 0 | 0 | 0 | 0 |
Resurfacing/Metal | Number | 318 | 233 | 551 | 534 | 447 | 981 | 0 | 0 | 0 |
| % | 0.7 | 0.3 | 0.5 | 48.9 | 69.7 | 56.6 | 0 | 0 | 0 |
Resurfacing/Resurfacing | Number | 3450 | 1275 | 4725 | 559 | 194 | 753 | 0 | 0 | 0 |
| % | 7.8 | 1.8 | 4.2 | 51.1 | 30.3 | 43.4 | 0 | 0 | 0 |
Basics of CUSUM method for time-to-event data
The CUSUM method is a sequential analysis technique based on the calculation of the series
Wi,
i=0,1,2,..., defined by a simple recurrent equation
$$\begin{array}{@{}rcl@{}} \begin{aligned} W_{0} & = 0, \\ W_{i+1} & = \max \{0,\; W_{i}+X_{i}\}, \end{aligned} \end{array} $$
where index
i stands for a single observation or for a group of observations and
Xi is the weight or score assigned to index
i. The CUSUM alerts when
Wi crosses a control limit, usually chosen to guarantee a long average run length (ARL) when the process is in control, or to provide a low false alarm probability [
20]. In applications to survival data, and assuming independent competing risks of revision and death, the score
Xi for an individual
i with time-to-revision
ti and vector of covariates
ui can be defined as the logarithm of the revision-specific factor of the likelihood ratio
$$\begin{array}{@{}rcl@{}} \begin{aligned} X_{i} = \log \left (\frac {f^{1}_{i}(t_{i}|\mathbf{u}_{i})^{\delta_{i}}S^{1}_{i}(t_{i}|\mathbf{u}_{i})^{1-\delta_{i}}}{f^{0}_{i}(t_{i}|\mathbf{u}_{i})^{\delta_{i}}S^{0}_{i}(t_{i}|\mathbf{u}_{i})^{1-\delta_{i}}}\right), \end{aligned} \end{array} $$
where δi is a censoring indicator, \(S^{j}_{i}(.)\) and \(f^{j}_{i}(.)\) are survival and density functions, respectively, and index j=0,1, stands for null hypothesis H0 (process is under control) and alternative hypothesis H1 (failure rate is higher than expected by a certain margin). Under the assumption of independent competing risks, the revision-specific factor of the likelihood coincides with the likelihood function that would be obtained be treating failures from any other causes as censored observations.
For a set
I of independent individuals, the score
XI can be calculated as a sum of individual scores
Xi,
i∈
I:
$$\begin{array}{@{}rcl@{}} \begin{aligned} X_{I} = \sum_{i\in I} X_{i}. \end{aligned} \end{array} $$
Assuming proportional hazards model with the Weibull baseline distribution under hypotheses Hj, j=0, 1, the hazard functions hj(t|u)=μj(t)χ(u) are proportional to the Weibull baseline hazards μj(t) and a regressor function χ(u). The regressor function is usually specified as χ(u)= exp(β∗u) (the Cox’s regression term) for a transposed column vector of unknown parameters β. The baseline hazard function under H0 corresponds to the hazard function μ0(t)=(k/λ)(t/λ)k−1 for the Weibull distribution with the shape parameter k and the scale parameter λ, and the baseline hazard function μ1(t) under the alternative hypothesis H1 is proportional to μ0, μ1(t)=HRμ0(t). The hazard ratio HR represents the departure from the target survival that we want to detect.
For consecutive time intervals
T, consider a subset
I=
IT of
NI individuals observed (prostheses in use) over the time interval
T. In this case, the scores
XI can be calculated as [
7]
$$\begin{array}{@{}rcl@{}} \begin{aligned} X_{I}=O_{I}\log(\text{HR})-(\text{HR}-1)E_{I}, \end{aligned} \end{array} $$
where OI is the observed number of failures (revisions) occurring during the interval T and EI is the number of failures that would be expected in the same interval under hypothesis H0.
Denote by (
t1i,
t2i) an intersection of the interval
T with the lifetime of the prosthesis
i implanted at
t0i. Then
t1i is the maximum of the lower bound of interval
T and
t0i, and
t2i is the minimum of the upper bound of interval
T, the time of revision of prosthesis
i and the time of censoring of the patient with prosthesis
i. From this, the value of (
t2i−
t1i) is equal to the length of time when prosthesis
i is in use in the time interval
T. The values of
EI can be computed as
$$\begin{array}{@{}rcl@{}} \begin{aligned} E_{I}=\sum_{i=1}^{N_{I}} \lambda^{-k}\left ((t_{2i}-t_{0i})^{k}-(t_{1i}-t_{0i})^{k}\right). \end{aligned} \end{array} $$
CUSUM scores for shared frailty competing risks model
Under the proportional hazards model with frailty, the hazard functions
h(
t|
u,
Z) for an observed vector of covariates
u and unobserved non-negative random frailty component
Z, is proportional to the baseline hazard
μ(
t), frailty term
Z, and a regressor function
χ(
u)= exp(
β∗u). The conditional survival function is given by
$$ {\begin{aligned} S(t|\mathbf u, Z)\,=\,\exp(-\int_{0}^{t}h (x|\mathbf{u},Z)dx)=\exp(-Z\chi (\mathbf{u})\int_{0}^{t} \mu(x)dx). \end{aligned}} $$
The marginal survival function is defined by
$$\begin{array}{@{}rcl@{}} \begin{aligned} S(t|\mathbf u)=\mathbb {E}S(t|\mathbf u, Z). \end{aligned} \end{array} $$
We will use the index
f,
f=
r,
d, to denote the types of failure (revision of implant or death of a patient without implant failure, respectively), considered as competing risks. For mathematical convenience, it is frequently assumed that frailty
Zf is gamma-distributed with mean 1 and unknown variance
\(\sigma _{f}^{2}\). The assumption of gamma distributed frailty is not too restrictive, as a number of authors demonstrated that gamma-based shared frailty models are robust for a wide class of frailty distributions [
21,
22]. The frailty variance
\(\sigma _{f}^{2}\) characterizes heterogeneity in the population.
We also assume that the baseline hazard functions are
\(\phantom {\dot {i}\!}\mu _{0,r}(t)=(k_{r}/\lambda _{r})(t/\lambda _{r})^{k_{r}-1}\) and
\(\mu _{0,d}(t)=\lambda _{d}\exp (k_{d}t)\phantom {\dot {i}\!}\) with the shape parameter
kf and the scale parameter
λf,
f=
r,
d, for the Weibull and Gompertz distributions, respectively. In this case, the type-of-failure specific marginal survival function is given by
$$\begin{array}{@{}rcl@{}} \begin{aligned} S_{f}(t|\mathbf u_{f})=(1+\sigma_{f}^{2}e^{\beta^{*}\mathbf u_{f}}H_{f}(t))^{-1/\sigma_{f}^{2}} \end{aligned} \end{array} $$
with the type-of-failure specific baseline cumulative hazards \(H_{r}(t)=(t/\lambda _{r})^{k_{r}}\phantom {\dot {i}\!}\) and \(\phantom {\dot {i}\!}H_{d}(t)=(\lambda _{d}/k_{d})(\exp (k_{d}t)-1)\).
Correlated frailty terms for revision and death can be constructed as
$$\begin{array}{@{}rcl@{}} \begin{aligned} Z_{r}= &Y_{0}+Y_{r}, \\ Z_{d}= &\frac {m_{r}}{m_{d}}Y_{0}+Y_{d} \end{aligned} \end{array} $$
(1)
for independent gamma distributed random variables Y0∼G(l0,mr) and Yf∼G(lf,mf) with \(l_{f}=1/\sigma _{f}^{2}-l_{0}\), \(m_{f}=1/\sigma _{f}^{2}\), f=r,d; 0≤ρ≤ min(σr/σd,σd/σr). The result of this construction is that the frailties are gamma-distributed with \(\mathbb {E}Z_{f}=1\), \(\text {Var}Z_{f}=\sigma _{f}^{2}\), and Corr(Zr,Zd)=ρ. Given the frailties (Zr,Zd) and the covariates (ur, ud), type-of-failure specific instantaneous risks are assumed to be conditionally independent at any time t.
The bivariate marginal survival function for the type-of-failure specific latent time moments (
tr,
td) is given by the formula
$$\begin{array}{@{}rcl@{}} {\begin{aligned} S(t_{r},t_{d}|\mathbf u_{r},\mathbf u_{d})= &\mathbb {E}S(t_{r},t_{d}|\mathbf u_{r},\mathbf u_{d},Z_{r},Z_{d}) \\ =&\mathbb {E}\exp (-Z_{r}\chi (\mathbf u_{r})H_{r}(t_{r})-Z_{d}\chi(\mathbf u_{d})H_{d} (t_{d})) \\ = & \frac {\left(1+\sigma_{r}^{2}\chi (\mathbf u_{r})H_{r}(t_{r})\right)^{-l_{r}}\left(1+\sigma_{d}^{2}\chi (\mathbf u_{d})H_{d} t_{d}\right)^{-l_{d}}}{\left (1+\sigma_{r}^{2}\chi (\mathbf u_{r})H_{r}(t_{r})+\sigma_{d}^{2}\chi (\mathbf u_{d})H_{d}(t_{d})\right)^{l_{0}}}& \end{aligned}} \end{array} $$
[
23]. If left truncation is present at ages (
t0r,
t0d), we calculate the conditional survival function by dividing the bivariate survival function by
S(
t0r,
t0d|
ur,
ud).
In the context of hip replacement, the shared frailty terms arise from the assumption that the
nj patients who have undergone surgery in the same unit
j,
j=1,⋯,
J, have the same, possibly correlated, unobserved risks of revision and death. This means that the full likelihood function for our model has a form of
\({\mathcal L}=\prod _{j=1}^{J}{\mathcal L}_{j}(\bar t_{jr},\bar t_{jd}|\bar {\mathbf {u}}_{jr},\bar {\mathbf {u}}_{jd})\) for
$$\begin{array}{@{}rcl@{}} {\begin{aligned} &{\mathcal L}_{j}(\bar t_{jr},\bar t_{jd}| \bar{\mathbf{u}}_{jr}, \bar{\mathbf{u}}_{jd})=\prod_{i=1}^{n_{j}}\left (-\frac{\partial }{\partial t_{jir}}\right)^{\delta_{jir}}\left (-\frac{\partial }{\partial t_{jid}}\right)^{\delta_{jid}}S_{j}(\bar{t}_{jr},\bar t_{jd}| \bar{\mathbf{u}}_{jr},\bar{\mathbf{u}}_{jd}), \end{aligned}} \end{array} $$
(2)
where
δf=0,1 is the censoring indicator with
δf=0 indicating right censoring, and
\(\bar t_{jf}\) and
\( \bar {\mathbf {u}}_{jf}\) are the vectors of cause-specific latent times and of covariates for the patients from unit
j, respectively,
f=
r,
d, and
$$\begin{array}{@{}rcl@{}} {\begin{aligned} &S_{j}(\bar t_{jr},\bar t_{jd}| \bar{\mathbf{u}}_{jr}, \bar{\mathbf{u}}_{jd}) \\ &=\frac {\left (1+\sigma_{r}^{2}\sum_{i=1}^{n_{j}}\chi (\mathbf u_{jir})H_{r}(t_{jir})\right)^{-l_{r}}\left (1+\sigma_{d}^{2}\sum_{i=1}^{n_{j}}\chi (\mathbf u_{jid})H_{d}(t_{jid})\right)^{-l_{d}}}{\left (1+\sigma_{r}^{2}\sum_{i=1}^{n_{j}}\chi (\mathbf u_{jir})H_{r}(t_{jir})+\sigma_{d}^{2}\sum_{i=1}^{n_{j}}\chi (\mathbf u_{jid})H_{d}(t_{jid})\right)^{l_{0}}}, \end{aligned}} \end{array} $$
where a subscript i, i=1,...,nj, corresponds to a current patient i from unit j. This likelihood can be used for parameter estimation.
Proposed CUSUM scores for a competing risks model with shared frailty are based on the likelihood ratio
\({\mathcal L}\). For a time interval
T, let
Ij(
T) be a set of individuals from unit
j whose implants are in use during the period
T, and
\(I=I(T)=\bigcup I_{j}(T)\). The scores
XI(
T) for the time interval
T are defined as
$$\begin{array}{@{}rcl@{}} {\begin{aligned} X_{I}(T) = \sum_{j =1}^{J}\log \left (\frac {\mathbb {E}\prod_{i \in I_{j}(T)}{\mathcal L}^{1}(t_{jir},t_{jid}|\mathbf{u}_{jir},\mathbf{u}_{jid},Z_{jr},Z_{jd})}{\mathbb {E}\prod_{i \in I_{j}(T)}{\mathcal L}^{0}(t_{jir},t_{jid}|\mathbf{u}_{jir},\mathbf{u}_{jid},Z_{jr},Z_{jd})}\right), \end{aligned}} \end{array} $$
(3)
where
Zjr,
Zjd are the shared frailty terms for unit
j, the superscript
h,
h=0,1, stands for hypothesis, and
$$\begin{array}{@{}rcl@{}} \begin{aligned} &{\mathcal L}^{h}(t_{jir},t_{jid}|\mathbf{u}_{jir},\mathbf{u}_{jid},Z_{jr},Z_{jd}) \\ &=\left (-\frac{\partial }{\partial t_{jir}}\right)^{\delta_{jir}}\left (-\frac{\partial }{\partial t_{jid}}\right)^{\delta_{jid}}S^{h}(t_{jir},t_{jid}|\mathbf u_{jir},\mathbf u_{jid},Z_{jr},Z_{jd}). \end{aligned} \end{array} $$
In general case, expression for XI(T) does not have a simple closed form. In the special case of ρ=0, the competing risks of revision and death are independent, and the score XI(T) is the sum of the respective component scores for revision and death (see Appendix). If the interest lies in the risk of revision only, death can be treated as a non-informative censoring, and we concentrate on the CUSUM analysis of revision scores to the end of this Section.
For the baseline Weibull hazard function, under the proportionate alternatives
μ1(
t)=HR
μ0(
t), we can rewrite the revision component of the score (
3) as
$$\begin{array}{@{}rcl@{}} {\begin{aligned} &X_{I}^{r}(T) = O_{I}\log (\text{HR})-\sum_{j =1}^{J}({\sigma_{r}^{-2}}+O_{j})\\ &\times\log \left(\frac {1+\sigma_{r}^{2}\text{HR}\sum_{i \in I_{j}(T)}e^{\beta^{*}\mathbf u_{i}}\lambda^{-k}((t_{2i}-t_{0i})^{k}-(t_{1i}-t_{0i})^{k})}{1+\sigma_{r}^{2}\sum_{i \in I_{j}(T)}e^{\beta^{*}\mathbf u_{i}}\lambda^{-k}((t_{2i}-t_{0i})^{k}-(t_{1i}-t_{0i})^{k})}\right), \end{aligned}} \end{array} $$
(4)
where Oj is a number of revisions in the unit j during period T so that \(O_{I}=\sum _{j}O_{j}\) (see Additional file 1 for proof).
Often, the proportional hazards assumption is too strong; different groups of patients and prostheses do not necessarily have proportional hazard functions for the hip revision times and/or for death. We weaken this assumption by allowing different shape parameters
kf(
u) in the baseline Weibull and Gompertz hazard functions which depend on covariates through additional Cox-regression multipliers,
\(k_{f}({\mathbf u})=\exp (\beta ^{*}_{k} \mathbf u)k_{f}\). Then the CUSUM scores for revision are calculated as
$$\begin{array}{@{}rcl@{}} {\begin{aligned} &X_{I}^{r}(T) = O_{I} \log(\text{HR})-\sum_{j=1}^{J}(\sigma_{r}^{-2}+O_{j}) \\ &\times\log \left (\frac {1\,+\,\sigma_{r}^{2}\text{HR}\sum_{i \in I_{j}(T)}e^{\beta^{*}\mathbf u_{ji}}\lambda^{\,-\,k_{r}({\mathbf u_{ji}})}((t_{j2i}\,-\,t_{j0i})^{k_{r}({\mathbf u_{ji}})}\,-\,(t_{j1i}\,-\,t_{j0i})^{k_{r}({\mathbf u_{ji}})})}{1\,+\,\sigma_{r}^{2}\sum_{i \in I_{j}(T)}e^{\beta^{*}\mathbf u_{ji}}\lambda^{\,-\,k_{r}({\mathbf u_{ji}})}((t_{j2i}\,-\,t_{j0i})^{k_{r}({\mathbf u_{ji}})}\,-\,(t_{j1i}-t_{j0i})^{k_{r}({\mathbf u_{ji}})})}\right). \\ \end{aligned}} \end{array} $$
(5)
CUSUM chart control limits for the shared frailty model for revision
The unknown parameters of the time-to-revision model under the null hypothesis
H0 are estimated from the in-control (learning) dataset. These are the Cox-regression parameters
β and
βk, parameters of the Weibull baseline distributions
k and
λ, and the variance of the frailty term
σ2. The vector of unknown parameters
ξ=(ln
k, ln
λ, ln
σ2,
β,
βk) is estimated using the maximum likelihood method to obtain the estimates
\(\hat \xi \). The time-to-failure distribution with these estimated parameters is then used to compute the CUSUM scores for the two test datasets and to estimate the control limits for the CUSUM chart: See Additional file
1 for details of calculation of the CUSUM score. Let
P=
P(
ξ) be the true distribution function for revision times, and
τ=
τc(
P;
ξ) is the time at which the chart alerts when it exceeds a threshold
c. The false alarm probability in
T time units is
\(hit(P;\xi) = \mathbb P(\tau _{c}(P;\xi) \leq T)\) for some finite
T>0. The threshold
chit(
P;
ξ)= inf{
c>0:
hit(
P;
ξ)≤
α} for some 0<
α<1 is needed to restrict the false alarm probability to
α. However, only
\(\hat P\) and
\(\hat \xi =\xi (\hat P)\) are known.
A parametric version of the bootstrap algorithm proposed by Gandy and Kvaløy [
12] is used to estimate the control limits to guarantee, that the false alarm rate of a CUSUM chart with the in-control distribution
P, conditional on
\(\hat \xi \), is below nominal level
α with high probability 1−
γ.
Define the first time
\(\tau _{c}(P|\hat \xi)\) at which the CUSUM chart conditional on
\(\hat \xi \) exceeds the given value
c. We are interested in the boundary
\(c_{hit}(P|\hat \xi)\) defined by equation
\(c_{hit}(P|\hat \xi)=\inf \{c>0:\; \mathbb P(\tau _{c} (P|\hat \xi)\leq T)\leq \alpha \}\) for some 0<
α<1. Since
P is unknown,
\(c_{hit}(P|\hat \xi)\) is unknown too and the estimate
\(c_{hit}(\hat P|\hat \xi)\) is usually used instead. However, such estimate does not guarantee the false alarm rate of the chart. Following [
12], we estimate the 1−
γ quantile for the threshold
\(c_{hit}(P|\hat \xi)\) for some 0<
γ<1 using the following algorithm.
Algorithm.
Let
N be the number of records (patients) in the control dataset,
NSim be the number of simulations needed to estimate
\(c_{hit}(\hat P|\hat \xi)\),
NBoot be the number of bootstrap replicates, and
T=[
Tmin,
Tmax] be the observation period.
1.Calculate the maximum likelihood estimate (MLE) \(\hat \xi \) of the vector of unknown parameters ξ as well as the estimate \(\widehat {\text {Cov}}\) of the covariance matrix cov (inverse Hessian) for \(\hat \xi \) using the control dataset and the survival model with Weibull hazard described above;
2.Generate from the multivariate normal distribution with mean \(\hat \xi \) and the covariance matrix \(\widehat {\text {Cov}}\), a random vector ξcur;
3.Keeping the covariates in all three test datasets fixed, generate for all patients new times-to-revision trev on the basis of the survival model with Weibull hazard described above and vector ξcur. Update the censoring using the rule δ=1 if trev<= min{tdeath,Tmax} and δ=0, otherwise. Replace trev for δ=0 by trev= min{tdeath,Tmax}. Repeat NSim times and calculate for the test dataset j, j=1,2, the values of \(c_{\text {hit}}^{j}(\hat P_{cur}|\hat \xi _{cur})\) and \(c_{\text {hit}}^{j}(\hat P|\hat \xi _{cur})\);
4.To take into account multiple testing, we set \(c_{\text {hit}}(\hat P_{cur}|\hat \xi _{cur})=\underset {j=1,2}\max \{c_{\text {hit}}^{j}(\hat P_{cur}|\hat \xi _{cur})\}\) and \(c_{\text {hit}}(\hat P|\hat \xi _{cur})=\underset {j=1,2}\max \{c_{\text {hit}}^{j}(\hat P|\hat \xi _{cur})\}\). Calculate \(p_{cur}=c_{\text {hit}}(\hat P_{cur}|\hat \xi _{cur})-c_{\text {hit}}(\hat P|\hat \xi _{cur})\);
5.Repeat steps 2-4 NBoot times and calculate the 1−γ empirical quantile pγ of pcur.
The estimate of the adjusted threshold is equal to
\(c_{hit}(\hat P|\hat \xi)-p_{\gamma }\). This threshold guarantees that in approximately 100(1−
γ)
% of the applications the probability of false alarm will not exceed the value of
α.In the “
Results” section, we use the values of
NSim=100,
NBoot=100,
α=0.1, and
γ=0.1,
Tmin=01.01.2005, and
Tmax=31.12.2012 for the analysis of the NJR data.
Estimating operating unit performance
Estimating performance across surgical units is also of potential importance in the quality control setting. The posterior frailty distribution obtained from the fitted shared frailty survival model described in the “
Methods” section, can be used for this purpose. Given the prior gamma distribution with (shape, scale) parameters (
a,
b)=(
σ−2,
σ2), mean
ab=1 and variance
ab2, and the observed data
Dj, the posterior frailty distribution for unit
j, is the gamma distribution with (shape, scale) parameters (
aj,
bj) equal to
$$\begin{array}{@{}rcl@{}} \begin{aligned} a_{j}&=a+O_{j}, \\ b_{j}&=\frac {b}{1+b\sum_{i \in I_{j}}H(t_{i},\mathbf{u}_{i})}, \end{aligned} \end{array} $$
where
Oj is the number of observed revisions in unit
j,
Ij is set of all patients from unit j, and
H(
ti,
ui) is the cumulative hazard for individual
i from unit j with time to revision (or censoring)
ti and the vector of covariates
ui [
24].
The effects of the units (shared frailties) are given by the conditional expectation
\(\mathbb {E}(Z_{j}|D_{j})=a_{j}b_{j}\), and parameters
aj and
bj can be estimated by substituting the MLE estimates
\(\hat \xi \) of the unknown parameters
ξ [
21]. Given the proportional hazards formulation, the shared frailty term can be interpreted as an excess hazard of a unit relative to the baseline hazard. Because of this interpretation, we refer to these estimated frailties as unit-level hazard ratios and denote them by HR
j.
Additionally, we propose a new score characterizing the quality of the hip replacement surgery in a unit as
$$ Q_{j}= P\{Z_{j}|D_{j}\} <1, $$
(6)
where Dj is the data from the control dataset relating to unit j. Large value of Q indicates a decreased hazard of revision in a unit, whereas small value of Q indicates poor performance of a unit. Since the values of Q and HR depend on the vector of unknown parameters ξ and only the MLE estimate \(\hat \xi \) of this vector is available, we generate a set of Naverage estimates \(\hat \xi _{l}\) from \(\mathrm {N}(\hat \xi,\widehat {cov})\) distribution, and take the average of the obtained estimates of \(Q(\hat \xi _{l})\) and of \(\text {HR}_{j}(\hat \xi _{l})\) over this set of parameters.