Abstract
Background
The purpose of alpha-fetoprotein (AFP) and abdominal ultrasound (US) cannot be discerned in administrative data.
Aim
We developed an algorithm to identify AFP and US used as surveillance tests for hepatocellular carcinoma (HCC).
Methods
We evaluated 300 AFP and 301 US tests from a VA database. Surveillance predictors in the administrative files (diagnoses, labs) were examined in logistic regression models. We calculated model-based probabilities of HCC surveillance status, and developed classification procedures using single and multiple imputation methods.
Results
The predictors of surveillance intent for AFP were absence of alcoholism, abdominal pain, ascites, diabetes and high AST levels. For US, the predictors of surveillance were prior AFP testing and HIV status and absence of abdominal pain, ascites, or drug dependence. For AFP classification, single imputation compared favorably with multiple imputation, both showing robustness in discrimination and calibration. For US both approaches were less robust in discrimination and calibration which was more moderate in multiple imputation than single imputation.
Conclusions
Predictive algorithms in administrative files can be used to identify AFP performed for HCC surveillance, however, the intent of US is more difficult to identify.
Similar content being viewed by others
Abbreviations
- AFP:
-
Alpha-fetoprotein
- CT:
-
Computed tomography
- HCC:
-
Hepatocellular carcinoma
- VA:
-
Veterans administration
References
El-Serag HB, Rudolph KL. Hepatocellular carcinoma: epidemiology and molecular carcinogenesis. Gastroenterology. 2007;132:2557–2576.
El-Serag HB, Marrero JA, Rudolph L, Reddy KR. Diagnosis and treatment of hepatocellular carcinoma. Gastroenterology. 2008;134:1752–1763.
Bruix J, Sherman M. Practice guidelines committee AAftSoL. Management of hepatocellular carcinoma. Hepatology. 2005;42:1208–1236.
Bruix J, Sherman M, Llovet JM, et al. Clinical management of hepatocellular carcinoma. Conclusions of the Barcelona-2000 EASL conference. European Association for the study of the liver. J Hepatol. 2001;35:421–430.
Trevisani F, De NS, Rapaccini G, et al. Semiannual and annual surveillance of cirrhotic patients for hepatocellular carcinoma: effects on cancer stage and patient survival (Italian experience). Am J Gastroenterol. 2002;97:734–744.
Singal A, Volk ML, Waljee A, et al. Meta-analysis: surveillance with ultrasound for early-stage hepatocellular carcinoma in patients with cirrhosis. Aliment Pharmacol Ther. 2009;30:37–47.
Backus LI, Gavrilov S, Loomis TP, et al. Clinical case registries: simultaneous local and national disease registries for population quality management. J Am Med Inform Assoc. 2009;16:775–783.
Kramer JR, Davila JA, Miller ED, Richardson P, Giordano TP, El-Serag HB. The validity of viral hepatitis and chronic liver disease diagnoses in Veterans Affairs administrative databases. Aliment Pharmacol Ther. 2008;27:274–282.
Greenland S. Modeling and variable selection in epidemiologic analysis. Am J Public Health. 1989;79:340–349.
Rubin DB. Multiple imputation after 18+ years. J Am Stat Assoc. 1996;91:473–489.
Xiao-Hua Z, McClish DK, Obuchowski NA. Methods in Diagnostic Medicine. New York: Wiley-Interscience; 2002.
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York: Springer; 2009.
Meng X, Rubin DB. Performing likelihood ratio tests with multiply-imputed data sets. Biometrika. 1992;79:103–111.
Acknowledgments
Financial support: This work was supported in part by the Houston VA HSR&D Center of Excellence (HFP90-020) and the National Cancer Institute (R01-CA-125487).
Conflicts of interest
None.
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Appendices
Appendix A
See Table 6.
Appendix B
Direct Multiple Imputation
In the direct multiple imputation approach, surveillance counts are taken to be expected values of those counts as random variables with respect to the probability distribution induced by the model-predicted surveillance probability values. Estimates of expected values of these and other random variables are obtained by taking means of the imputed counts over the imputation iterations. In particular, estimation of parameter estimates in multiple imputation approaches to generalized linear modeling proceeds in this same way [12].
Decision-Theoretic Framework for Dichotomization
The decision-theoretic approach involves treating the predicted probabilities as a continuous scale, on which, thresholds are determined which minimize the total misclassification costs of true and false positives and negatives.
Costs are assigned per unit to true and false positives and negatives:
This cost function can be applied directly to the vertices of the ROC curve for the predictive model. Those vertices result in minimal yield of the desired dichotomization thresholds. Slopes of level curves of this cost function can assist in this process:
The slopes of the level curves of the total cost function on the ROC plane (with x = (1 − spec) and y = sens) are equal to
In case, as with most applications, the costs for true positives and true negatives are zero, this is simply
The cost-minimizing ROC vertices will be tangency points for these slopes.
In particular, equal unit misclassification cost assignments to false negatives and false positives (β = γ) result in cost-minimization, which coincides with maximization of agreement (#TP + #TN).
Rights and permissions
About this article
Cite this article
Richardson, P., Henderson, L., Davila, J.A. et al. Surveillance for Hepatocellular Carcinoma: Development and Validation of an Algorithm to Classify Tests in Administrative and Laboratory Data. Dig Dis Sci 55, 3241–3251 (2010). https://doi.org/10.1007/s10620-010-1387-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10620-010-1387-y