nach oben

Quality of Life Research

Erschienen in:

01.12.2016

Impact of IRT item misfit on score estimates and severity classifications: an examination of PROMIS depression and pain interference item banks

verfasst von: Yue Zhao

Erschienen in: Quality of Life Research | Ausgabe 3/2017

Einloggen, um Zugang zu erhalten

Abstract

Purpose

In patient-reported outcome research that utilizes item response theory (IRT), using statistical significance tests to detect misfit is usually the focus of IRT model-data fit evaluations. However, such evaluations rarely address the impact/consequence of using misfitting items on the intended clinical applications. This study was designed to evaluate the impact of IRT item misfit on score estimates and severity classifications and to demonstrate a recommended process of model-fit evaluation.

Methods

Using secondary data sources collected from the Patient-Reported Outcome Measurement Information System (PROMIS) wave 1 testing phase, analyses were conducted based on PROMIS depression (28 items; 782 cases) and pain interference (41 items; 845 cases) item banks. The identification of misfitting items was assessed using Orlando and Thissen’s summed-score item-fit statistics and graphical displays. The impact of misfit was evaluated according to the agreement of both IRT-derived T-scores and severity classifications between inclusion and exclusion of misfitting items.

Results

The examination of the presence and impact of misfit suggested that item misfit had a negligible impact on the T-score estimates and severity classifications with the general population sample in the PROMIS depression and pain interference item banks, implying that the impact of item misfit was insignificant.

Conclusions

Findings support the T-score estimates in the two item banks as robust against item misfit at both the group and individual levels and add confidence to the use of T-scores for severity diagnosis in the studied sample. Recommendations on approaches for identifying item misfit (statistical significance) and assessing the misfit impact (practical significance) are given.

The overall alpha level of .05 was adjusted with the total number of items in the respective PROMIS item bank. The adjusted alpha values from the smallest to largest ranged from .0018 (.05/28) to .05 for the PROMIS-DEP and ranged from .0012 (.05/41) to .05 for the PROMIS-PI.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., Hays, R. D., on behalf of the PROMIS Cooperative Group. (2010). Initial item banks and first wave testing of the Patient-Reported Outcomes Measurement Information System (PROMIS) network: 2005–2008. Journal of Clinical Epidemiology, 63, 1179–1194. doi:10.1016/j.jclinepi.2010.04.011.CrossRef

Swaminathan, H., Hambleton, R. K., & Rogers, H. J. (2007). Assessing fit in item response models. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics: Psychometrics. London: Elsevier.

Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life items banks: Plans for the patient-reported outcome measurement information system (PROMIS). Medical Care, 45(5), S22–S31. doi:10.1097/01.mlr.0000250483.85507.04.CrossRefPubMed

Hambleton, R. K., & Han, N. (2005). Assessing the fit of IRT models to educational and psychological test data: A five step plan and several graphical displays. In W. R. Lenderking & D. Revicki (Eds.), Advances in health outcomes research methods, measurement, statistical analysis, and clinical applications (pp. 57–78). Washington: Degnon Associates.

Box, G. E. P., & Draper, N. R. (1987). Empirical model building and response surfaces. New York, NY: Wiley.

Sinharay, S., & Haberman, S. J. (2014). How often is the misfit of item response theory models practically significant? Educational Measurement: Issues and Practice, 33(1), 23–35. doi:10.1111/emip.12024.CrossRef

Zhao, Y. (2008). Approaches for addressing the fit of item response theory models to educational test data. Dissertation Abstract International, 69, 12A. (UMI No. 3337019).

Cella, D., Yount, S., Rothrock, N., Gershon, R., Cook, K., Reeve, B., et al. (2007). The Patient-Reported Outcomes Measurement Information System (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care, 45(5 Suppl 1), S3–S11. doi:10.1097/01.mlr.0000258615.42478.55.CrossRefPubMedPubMedCentral

10.

Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., & Cella, D. (2011). Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS^®): Depression, anxiety, and anger. Assessment, 18(3), 263–283. doi:10.1177/1073191111411667.CrossRefPubMedPubMedCentral

11.

Amtmann, D. A., Cook, K. F., Jensen, M. P., Chen, W.-H., Choi, S. W., Revicki, D., et al. (2010). Development of a PROMIS item bank to measure pain interference. Pain, 150(1), 173–182. doi:10.1016/j.pain.2010.04.025.CrossRefPubMedPubMedCentral

12.

Liu, H., Cella, D., Gershon, R., Shen, J., Morales, L. S., Riley, W., et al. (2010). Representativeness of the patient-reported outcomes measurement information system internet panel. Journal of Clinical Epidemiology, 63(11), 1169–1178. doi:10.1016/j.jclinepi.2009.11.021.CrossRefPubMedPubMedCentral

13.

DeWalt, D. A., Rothrock, N., Yount, S., & Stone, A. A. (2007). Evaluation of item candidates: The PROMIS qualitative item review. Medical Care, 45(5 Suppl 1), S12–S21. doi:10.1097/01.mlr.0000254567.79743.e2.CrossRefPubMedPubMedCentral

14.

Radloff, L. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1(3), 385–401. doi:10.1177/014662167700100306.CrossRef

15.

Cleeland, C. S., Gonin, R., Hatfield, A. K., Edmonson, J. H., Blum, R. H., Stewart, J. A., et al. (1994). Pain and its treatment in outpatients with metastatic cancer. New England Journal of Medicine, 330(9), 592–596. doi:10.1056/NEJM199403033300902.CrossRefPubMed

16.

Cella, D., Choi, S., Garcia, S., Cook, K. F., Rosenbloom, S., Lai, J.-S., et al. (2014). Setting standards for severity of common symptoms in oncology using the PROMIS item banks and expert judgment. Quality of Life Research, 23(10), 2651–2661. doi:10.1007/s11136-014-0732-6.CrossRefPubMedPubMedCentral

17.

Muthén, L. K., & Muthén, B. O. (2006). Mplus [Computer software]. Los Angeles, CA: Muthén & Muthén.

18.

Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88(3), 588. doi:10.1037/0033-2909.88.3.588.CrossRef

19.

Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. Sage Focus Editions, 154, 136. doi:10.1177/0049124192021002005.

20.

Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. doi:10.1080/10705519909540118.CrossRef

21.

Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four commonly reported cutoff criteria what did they really say? Organizational Research Methods, 9(2), 202–220. doi:10.1177/1094428105284919.CrossRef

22.

Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. doi:10.1177/01466216000241003.CrossRef

23.

Cai, L., Thissen, D., & du Toit, S. (2015). IRTPRO [Computer software]. Lincolnwood, IL: Scientific Software International.

24.

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300. doi:10.2307/2346101.

25.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Chicago: Psychometric Society. doi:10.1002/j.2333-8504.1968.tb00153.x.

26.

Thissen, D., Chen, W.-H., & Bock, R. D. (2003). Multilog 7.03 [Computer software]. Lincolnwood, IL: Scientific Software International.

27.

Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210. doi:10.1177/014662168300700208.CrossRef

28.

Kim, S., & Kolen, M. J. (2004). STUIRT: A computer program for scale transformation under unidimensional item response theory models (Version 1.0). Iowa Testing Programs, University of Iowa.

29.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

30.

Yost, K. J., Eton, D. T., Garcia, S. F., & Cella, D. (2011). Minimally important differences were estimated for six PROMIS-Cancer scales in advanced-stage cancer patients. Journal of Clinical Epidemiology, 64(5), 507–516. doi:10.1016/j.jclinepi.2010.11.018.CrossRefPubMedPubMedCentral

31.

Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X²: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289–298. doi:10.1177/0146621603027004004.CrossRef

32.

Kang, T., & Chen, T. (2011). Performance of the generalized S–X² item fit index for the graded response model. Asia Pacific Education Review, 12(1), 89–96. doi:10.1007/s12564-010-9082-4.CrossRef

33.

Kang, T., & Chen, T. (2008). Performance of the generalized S–X² item fit index for polytomous IRT models. Journal of Educational Measurement, 45(4), 391–406. doi:10.1111/j.1745-3984.2008.00071.x.CrossRef

34.

Smits, N. (2016). On the effect of adding clinical samples to validation studies of patient-reported outcome item banks: A simulation study. Quality of Life Research, 25(7), 1635–1644. doi:10.1007/s11136-015-1199-9.CrossRefPubMed

35.

Choi, S. W., Reise, S. P., Pilkonis, P. A., Hays, R. D., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research : An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 19(1), 125–136. doi:10.1007/s11136-009-9560-5.CrossRef

Titel: Impact of IRT item misfit on score estimates and severity classifications: an examination of PROMIS depression and pain interference item banks
verfasst von: Yue Zhao
Publikationsdatum: 01.12.2016
Verlag: Springer International Publishing
Erschienen in: Quality of Life Research / Ausgabe 3/2017
Print ISSN: 0962-9343
Elektronische ISSN: 1573-2649
DOI: https://doi.org/10.1007/s11136-016-1467-3

Springer Medizin

Abstract

Purpose

Methods

Results

Conclusions

Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten

Weitere Artikel der Ausgabe 3/2017

Responsiveness of SF-36 Health Survey and Patient Generated Index in people with chronic knee pain commenced on oral analgesia: analysis of data from a randomised controlled clinical trial

Are the EQ-5D-3L and the ICECAP-O responsive among older adults with impaired mobility? Evidence from the Vancouver Falls Prevention Cohort Study

Development and validation of a new questionnaire measuring treatment satisfaction in patients with non-valvular atrial fibrillation: SAFUCA®

Art therapy based on appreciation of famous paintings and its effect on distress among cancer patients

Quality of life in adults with asthma treated in allergy and pneumology subspecialties: relationship with sociodemographic, clinical and psychological variables

Patient and caregiver goals for dementia care