Abstract
Developing individualized prediction rules for disease risk and prognosis has played a key role in modern medicine. When new genomic or biological markers become available to assist in risk prediction, it is essential to assess the improvement in clinical usefulness of the new markers over existing routine variables. Net reclassification improvement (NRI) has been proposed to assess improvement in risk reclassification in the context of comparing two risk models and the concept has been quickly adopted in medical journals (Pencina et al., Stat Med 27:157–172, 2008). We propose both nonparametric and semiparametric procedures for calculating NRI as a function of a future prediction time \(t\) with a censored failure time outcome. The proposed methods accommodate covariate-dependent censoring, therefore providing more robust and sometimes more efficient procedures compared with the existing nonparametric-based estimators (Pencina et al., Stat Med 30:11–21, 2011; Uno et al., Comparing risk scoring systems beyond the roc paradigm in survival analysis, 2009). Simulation results indicate that the proposed procedures perform well in finite samples. We illustrate these procedures by evaluating a new risk model for predicting the onset of cardiovascular disease.
Similar content being viewed by others
References
Andersen P, Gill R (1982) Cox’s regression model for counting processes: a large sample study. Ann Stat 10:1100–1120
Bilias Y, Gu M, Ying Z (1997) Towards a general asymptotic theory for Cox model with staggered entry. Ann Stat 25:662–682
Cai T, Tian L, Uno H, Solomon S, Wei L (2010) Calibrating parametric subject-specific risk estimation. Biometrika 97:389–404
Cook N (2007) Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 115:928
Cook N, Buring J, Ridker P (2006) The effect of including c-reactive protein in cardiovascular risk prediction models for women. Annals of Internal Medicine 145:21
Cui J (2009) Overview of risk prediction models in cardiovascular disease research. Ann Epidemiol 19:711–717
Dabrowska D (1997) Smoothed cox regression. Ann Stat 25(4):1510–1540
Du Y, Akritas M (2002) Uniform strong representation of the conditional Kaplan–Meier process. Math Methods Stat 11:152–182
Efron B, Tibshirani R (1997) Improvements on cross-validation: the.632+ bootstrap method. J Am Stat Assoc 92(438):548–560
Gail M, Brinton L, Byar D, Corle D, Green S, Schairer C, Mulvihill J (1989) Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst (JNCI) 81:1879
Gu W, Pepe M (2009) Measures to summarize and compare the predictive capacity of markers. Int J Biostat 5:27
Hemann B, Bimson W, Taylor A (2007) The framingham risk score: an appraisal of its benefits and limitations. Am Heart Hosp J 5:91–96
Hjort N (1992) On inference in parametric survival data models. Int Stat Rev 60(3):355–387
Kannel W, Feinleib M, McNamara P, Garrison R, Castelli W (1979) An investigation of coronary heart disease in families. Am J Epidemiol 110:281
Khot U, Khot M, Bajzer C, Sapp S, Ohman E, Brener S, Ellis S, Lincoff A, Topol E (2003) Prevalence of conventional risk factors in patients with coronary heart disease. JAMA 290:898–904
Lin D, Wei L (1989) The robust inference for the Cox proportional hazards model. J Am Stat Assoc 84:1074–1078
Lloyd-Jones D (2010) Cardiovascular risk prediction. Circulation 121:1768–1777
Pencina M, D’Agostino R Sr (2011) Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med 30:11–21
Pencina M, D’Agostino R Sr, D’Agostino R Jr (2008) Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 27:157–172
Pollard D (1990) Empirical processes: theory and applications. Institute of Mathematical Statistics, Hayward
Satten G, Datta S (2001) The kaplan-meier estimator as an inverse-probability-of-censoring weighted average. Am Stat 55:207–210
Tian L, Cai T, Goetghebeur E, Wei L (2007) Model evaluation based on the sampling distribution of estimated absolute prediction error. Biometrika 94:297–311
Uno H, Cai T, Tian L, Wei L (2007) Evaluating prediction rules for t-year survivors with censored regression models. J Am Stat Assoc 102:527–537
Uno H, Tian L, Cai T, Kohane I, Wei L (2009) Comparing risk scoring systems beyond the roc paradigm in survival analysis. Harvard University Biostatistics Working Paper Series, p 107
Wilson P, D’Agostino R, Levy D, Belanger A, Silbershatz H, Kannel W (1998) Prediction of coronary heart disease using risk factor categories. Circulation 97:1837
Zheng Y, Cai T, Pepe M, Levy W (2008) Time-dependent predictive values of prognostic biomarkers with failure time outcome. J Am Stat Assoc 103:362–368
Acknowledgments
The Framingham Heart Study and the Framingham SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University. The Framingham SHARe data used for the analyses described in this manuscript were obtained through dbGaP (access number: phs000007.v3.p2). This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or the NHLBI. The work is supported by grants U01-CA86368, P01-CA053996, R01- GM085047, R01-GM079330 awarded by the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Throughout, we assume that the joint density of \((T,C,\mathbf{Y})\) is twice continuously differentiable, \(\mathbf{Y}\) are bounded, and \(1 > P(T> t) >0, \,1 > P(C> t) >0\). The kernel function \(K\) is a symmetric probability density function with compact support and bounded second derivative. The bandwidth \(h \rightarrow 0\) such that \(nh^4 \rightarrow 0\). In addition, the estimator \({\widehat{\varvec{\theta }}}_k\) converges to \(\varvec{\theta }_{0k}\) for \(k=1,2\) as \(n \rightarrow \infty \) (Hjort 1992), where \(\varvec{\beta }_{k0}\) is the unique maximizer of the expected value of the corresponding partial likelihood and \(\Lambda _{k0}\) is the baseline cumulative hazard for \(k=1,2\). We denote the parameter space for \(\varvec{\theta }_k\) by \(\Omega _k\) and assume that \(\Omega _k\) is a compact set containing \(\varvec{\theta }_{0k}\). Furthermore, we assume that \(\varvec{\beta }_2 \ne 0\) and note that \(Q(\varvec{\theta }_2) = 1-\exp \{ \Lambda _{02}(t)e^{\varvec{\beta }_2^{\mathsf{\scriptscriptstyle {T}}}\mathbf{Y}_{(2)}} \}\) and \(P(\varvec{\theta }_1) = 1-\exp \{ \Lambda _{01}(t)e^{\varvec{\beta }_1^{\mathsf{\scriptscriptstyle {T}}}\mathbf{Y}_{(1)}} \}\) are the respective limits of \(Q({\widehat{\varvec{\theta }}}_2)\) and \(P({\widehat{\varvec{\theta }}}_1)\), for any given \(\mathbf{Y}_{(2)}\) and \(\mathbf{Y}_{(1)}\). The in-probability convergence of \(Q({\widehat{\varvec{\theta }}}_2) \rightarrow Q(\varvec{\theta }_{02})\) and \(P({\widehat{\varvec{\theta }}}_1)\) and \(P(\varvec{\theta }_{01})\) are uniform in \(\mathbf{Y}_{(2)}\) and \(\mathbf{Y}_{(1)}\) due to the convergence of \({\widehat{\varvec{\theta }}}\rightarrow \varvec{\theta }_0=(\varvec{\theta }_{01}^{\mathsf{\scriptscriptstyle {T}}},\varvec{\theta }_{02}^{\mathsf{\scriptscriptstyle {T}}})^{\mathsf{\scriptscriptstyle {T}}}\).
1.1 Asymptotic Properties of \(\widehat{\text{ NRI}}({\widehat{\varvec{\theta }}}, t)\)
From the same arguments as given in Cai et al. (2010) and Dabrowska (1997), it follows that we have the uniform consistency of \({\widetilde{H}}^{(\iota )}_{q}(t)\) to \(H^{(\iota )}_{q}(t) = P(C \geqslant t \mid Q(\varvec{\theta }_{2}) = q, \Delta (\varvec{\theta }) \in \mho _{\iota }\), where \(\mho _1 = 1\) and \(\mho _{\bullet } = \{0,1\}\), for \(\iota = 1\) and \(\bullet \). It follows, using the uniform law of large numbers (Pollard 1990), that
This along with the convergence of \({\widehat{\varvec{\theta }}}\) to \(\varvec{\theta }_0\) implies that \(\widetilde{\text{ NRI}}({\widehat{\varvec{\theta }}}, t)\) is uniformly consistent for \(\text{ NRI}(\varvec{\theta }_0,t)\).
Throughout, we will use the fact that \(E\{ \Delta _i(\varvec{\theta })I(X_{i} \leqslant t)\delta _{i} H^{(1)}_{Q_{i}(\varvec{\theta }_2)}(X_i)^{-1} \mid Q_i(\varvec{\theta }_2) = q \} = P(\Delta _i(\varvec{\theta })=1, T_{i} \leqslant t \mid Q_i(\varvec{\theta }_2) = q) \) if either \(C \perp T,\mathbf{Y}_{(2)}\) (model may be misspecified) or \(Q(\varvec{\theta }_2) = \text{ Pr}(T \leqslant t | Y_{(2)})\) i.e. the Cox model is correctly specified though censoring may be such that \(C \perp T \mid \mathbf{Y}_{(2)}\) (double robustness). We first write the i.i.d representation of \(\sqrt{n}[\widetilde{\text{ NRI}}(\varvec{\theta }, t) - \text{ NRI}(\varvec{\theta }, t) ]\) for any \(\varvec{\theta }\). Note that \(\sqrt{n} \{\widetilde{\text{ NRI}}(\varvec{\theta }, t)-\text{ NRI}(\varvec{\theta }, t) \} = 2 \sqrt{n} \{ \widetilde{\text{ Pr}}(\Delta (\varvec{\theta }) =1|T\leqslant t ) - \text{ Pr}(\Delta (\varvec{\theta }) =1|T\leqslant t ) \} - 2\sqrt{n} \{ \widetilde{\text{ Pr}}(\Delta (\varvec{\theta })=1|T > t )- \text{ Pr}(\Delta (\varvec{\theta })=1|T > t ) \}\). We first examine the initial component,
where \({\hat{N}}(t,\varvec{\theta }, H) = n^{-1} \sum _{i} \Delta _{i}(\varvec{\theta })I(X_{i} \leqslant t)\delta _i / H^{(1)}_{Q_{i}(\varvec{\theta }_2)}(X_i)\) and \({\hat{D}}(t,\varvec{\theta },H) = n^{-1} \sum _{i} I(X_i\leqslant t)\delta _{i} / H^{(\bullet )}_{Q_{i}(\varvec{\theta }_2)}(X_i)\). Let \(N(t, \varvec{\theta }) = \text{ Pr}(\Delta (\varvec{\theta })=1, T\leqslant t )\) and \(D(t)= \text{ Pr}(T\leqslant t )\). Then by the uniform consistency of the IPW weights, we have
Examining the numerator, \(\sqrt{n}\{ {\hat{N}}(t,\varvec{\theta }, {\widetilde{H}})D(t)- N(t,\varvec{\theta }){\hat{D}}(t,\varvec{\theta }, {\widetilde{H}}) \} = \sqrt{n}\{ (1) + (2) - (3) \}\) where \((1) = {\hat{N}}(t,\varvec{\theta }, H)D(t)- {\hat{D}}(t,\varvec{\theta }, H)N(t,\varvec{\theta }), \quad (2) = {\hat{N}}(t,\varvec{\theta }, {\widetilde{H}})D(t) - {\hat{N}}(t,\varvec{\theta }, H)D(t),\) and \( (3) = [N(t,\varvec{\theta }){\hat{D}}(t,\varvec{\theta }, {\widetilde{H}}) - {\hat{D}}(t,\varvec{\theta }, H)N(t,\varvec{\theta }) ].\) Note that
Using a Taylor series expansion, Lemma A.3 of Bilias et al. (1997) and the asymptotic expansion for \(\widehat{\Lambda }_{q}(t)\) given in Du and Akritas (2002),
where
Now by a change of variable, \(\psi = \frac{q-Q_i{(\varvec{\theta }_2)}}{h}\) and \(f(t,q) \equiv \partial ^2 P(\Delta (\varvec{\theta }) = 1, T\leqslant t, Q (\varvec{\theta }_2) \leqslant q) / \partial t \partial q\),
where \(U_{2i}(t) = D(t) \int _0^t a(s,{ q^*}, X_i) ds\) and \(a(t,q, X_i) \!=\! M_{ {Cq^*}}(t,X_i, \delta _i) f(t,{ q^*})\). Similar arguments can be used to obtain an asymptotic expansion for (3) as \((3) \approx n^{-\frac{1}{2}}\sum U_{3i}(t)\) and therefore, the numerator, \(\sqrt{n}\left[{\hat{N}}(t,\varvec{\theta }, {\widetilde{H}})D(t)\!\!-\!\! N(t,\varvec{\theta }){\hat{D}}(t,\varvec{\theta }, {\widetilde{H}}) \right] \approx n^{-\frac{1}{2}}\sum \{U_{1i}(t) + U_{2i}(t) + U_{3i}(t)\}.\) The same arguments as given above can be used to obtain an asymptotic expansion for \(\sqrt{n} \{ \widetilde{\text{ Pr}}(\Delta (\varvec{\theta })=1|T > t )- \text{ Pr}(\Delta (\varvec{\theta })=1|T > t ) \}\) as \(n^{-\frac{1}{2}}\sum _{i=1}^nD(t)_{-}^{-2}\{U_{-1i}(t) + U_{-2i}(t) + U_{-3i}(t) \} \) where \(D(t)_{-}\), \(U_{-1i}(t), U_{-2i}(t),\) and \(U_{-3i}(t)\) are defined similarly to \(D(t)\), \(U_{1i}(t), U_{2i}(t),\) and \(U_{3i}(t)\) with \(T\leqslant t\) replaced with \(T > t\). Therefore, \(\sqrt{n} \{ \widetilde{\text{ NRI}}(\varvec{\theta }, t)-\text{ NRI}(\varvec{\theta }, t) \} \approx n^{-\frac{1}{2}}\sum _{i=1}^n 2[ D(t)^{-2}\{U_{1i}(t) + U_{2i}(t) + U_{3i}(t)\}- D(t)_{-}^{{ -2}}\{U_{-1i}(t) + U_{-2i}(t) + U_{-3i}(t)\}] = n^{-\frac{1}{2}}\sum _{i=1}^n \eta _i(t)\).
Note that regardless of correct model specification, \(\sqrt{n}({\widehat{\varvec{\theta }}}- \varvec{\theta }_0) = n^{-1/2} \sum \psi _i + o_p(1)\) where \(\psi _i\) are i.i.d mean zero random variables by Lin and Wei (1989) and Uno et al. (2009). Using a Taylor series approximation and the i.i.d representation of \( \sqrt{n}[\widetilde{\text{ NRI}}(\varvec{\theta },t) - \text{ NRI}(\varvec{\theta }, t) ]\) for any \(\varvec{\theta }\), we can write \(\widetilde{\mathcal{W }}(t) = \sqrt{n}[\widetilde{\text{ NRI}}({\widehat{\varvec{\theta }}}, t) - \text{ NRI}(\varvec{\theta }_0, t) ] \) as a sum of i.i.d terms, \(n^{-1/2} \sum _{i=1}^n \epsilon _i(t)\) defined below.
where \(\epsilon _i(u,v,t) = \eta _i(u,v,t) + \psi _i \frac{\partial \text{ NRI}(t)}{\partial \varvec{\theta }} |_{\varvec{\theta }_0}\). By a functional central limit theorem of Pollard (1990), the process \(\widetilde{\mathcal{W }}(t)\) converges weakly to a mean zero Gaussian process in \(t\).
1.2 Asymptotic Properties of \(\widehat{\text{ NRI}}({\widehat{\varvec{\theta }}}, t)\)
Recall that we assume the Cox model is correctly specified and thus, \(Q(\varvec{\theta }_2) = Q(\varvec{\theta }_2, t, \mathbf{Y}_{(2)})= \text{ Pr}(T \leqslant t | Y_{(2)}) = 1-\exp \{ \Lambda _{02}(t)e^{\varvec{\beta }_2 ^{\mathsf{\scriptscriptstyle {T}}}Y_{(2)}} \}\) and \(S_{Q_i(\varvec{\theta }_2)}(t) = \text{ Pr}(T > t | Y_{(2)}) = \exp \{ \Lambda _{02}(t)e^{\varvec{\beta }_2 Y_{(2)}} \}\). To derive asymptotic properties of \(\widehat{\text{ NRI}}({\widehat{\varvec{\theta }}}, t)\) we assume the same regularity conditions as in Andersen and Gill (1982). The uniform consistency of \(Q({\widehat{\varvec{\theta }}}_2, t, \mathbf{Y}_{(2)})\) for \(Q(\varvec{\theta }_2, t, \mathbf{Y}_{(2)})\) in \(t\) and \(\mathbf{Y}_{(2)}\) follows directly from the uniform consistency of \({\widehat{\Lambda }}_{02}(t)\) and \({\widehat{\varvec{\beta }}}_2\). It follows from the uniform law of large numbers (Pollard 1990) that \(\widehat{\text{ NRI}}({\widehat{\varvec{\theta }}}, t)\) is uniformly consistent for \(\text{ NRI}(\varvec{\theta }_0, t)\). Andersen and Gill (1982) show that \(\sqrt{n}({\widehat{\beta }}_2 - \beta _{02})\) is a normal random variable and \(\sqrt{n}({\widehat{\Lambda }}_{02}(t) - \Lambda _{02}(t))\) converges to a Gaussian process. By the functional delta method it can be shown that \(\sqrt{n}\{Q({\widehat{\varvec{\theta }}}_2, t, \mathbf{Y}_{(2)}) - Q(\varvec{\theta }_2, t, \mathbf{Y}_{(2)}) \}\) converges to a zero mean Gaussian process in \(t\) and \(\mathbf{Y}_{(2)}\) (Zheng et al. 2008). Similar to the derivation for \(\widetilde{\text{ NRI}}({\widehat{\varvec{\theta }}}, t)\), it can be shown that the process \(\widetilde{\mathcal{N}}(t) = \sqrt{n} [ \widehat{\text{ NRI}}({\widehat{\varvec{\theta }}}, t) - \text{ NRI}(\varvec{\theta }_0, t) ]\) is asymptotically equivalent to \(n^{-1/2} \sum _{i=1}^n \zeta _i(u,v,t).\) In particular, for a fixed \(\varvec{\theta }, \, \sqrt{n} \{ \widehat{\text{ NRI}}(\varvec{\theta }, t)-\text{ NRI}(\varvec{\theta }, t) \} \approx n^{-1/2} \sum _{i=1}^n \eta ^*_i(t)\) where \(\eta ^*_i(t) = 2 [D(t)^{-2} \{ \Delta _i(\varvec{\theta })Q_i(\varvec{\theta }_2)- \text{ Pr}(\Delta _i(\varvec{\theta })=1 | T_i \leqslant t) Q_i(\varvec{\theta }_2) \} - D(t)_{-}^{-2} \{ \Delta _i(\varvec{\theta })[1-Q_i(\varvec{\theta }_2)] - \text{ Pr}(\Delta _i(\varvec{\theta }) = 1 | T_i > t) [1-Q_i(\varvec{\theta }_2)]\}] \). Thus, \(\widetilde{\mathcal{N}}(t) \approx n^{-1/2} \sum _{i=1}^n \zeta _i(t)\) where \(\zeta _i(u,v,t) = \eta ^*_i(t) + \psi _i \frac{\partial \text{ NRI}(t)}{\partial \varvec{\theta }} |_{\varvec{\theta }_0}\). Once again, using a functional central limit theorem, this implies that \(\widetilde{\mathcal{N}}(t)\) converges to a Gaussian process with mean zero.
Rights and permissions
About this article
Cite this article
Zheng, Y., Parast, L., Cai, T. et al. Evaluating incremental values from new predictors with net reclassification improvement in survival analysis. Lifetime Data Anal 19, 350–370 (2013). https://doi.org/10.1007/s10985-012-9239-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-012-9239-z