Skip to main content
Erschienen in: Quality of Life Research 1/2015

01.01.2015 | Quantitative Methods Special Section

Quantifying ‘problematic’ DIF within an IRT framework: application to a cancer stigma index

verfasst von: Maria Orlando Edelen, Brian D. Stucky, Anita Chandra

Erschienen in: Quality of Life Research | Ausgabe 1/2015

Einloggen, um Zugang zu erhalten

Abstract

Purpose

DIF detection within an IRT framework is highly powerful, often identifying significant DIF that is of little clinical importance. This paper introduces two metrics for IRT DIF evaluation that can discern potentially problematic DIF among items flagged with statistically significant DIF.

Methods

Computation of two DIF metrics—(1) a weighted area between the expected score curves (wABC) and (2) a difference in expected a posteriori scores across item response categories (dEAP)—is described. Their use is demonstrated using data from a 27-item cancer stigma index fielded to four adult samples: (1) Arabic (N = 633) and (2) English speakers (N = 324) residing in Jordan and Egypt, and (3) English (N = 500) and (4) Mandarin speakers (N = 500) residing in China. We used IRTPRO’s DIF module to calculate IRT-based Wald chi-square DIF statistics according to language within each region. After standard p value adjustments for multiple comparisons, we further evaluated DIF impact with wABC and dEAP.

Results

There were a total of twenty statistically significant DIF comparisons after p value adjustment. The wABCs for these items ranged from 0.13 to 0.90. Upon inspection of curves, DIF comparisons with wABCs >0.3 were deemed potentially problematic and were considered further for removal. The dEAP metric was also informative regarding impact of DIF on expected scores, but less consistently useful for narrowing down potentially problematic items.

Conclusions

The calculations of wABC and dEAP function as DIF effect size indicators. Use of these metrics can substantially augment IRT DIF evaluation by discerning truly problematic DIF items among those with statistically significant DIF.
Fußnoten
1
The improved Wald test is believed to be an improvement on Lord’s [23] original Wald chi square through its estimation of the covariance matrix with the supplemented expectation maximization (SEM) algorithm [24, 25].
 
2
As an estimate of the posterior distribution, the group-specific EAPs used to compute the dEAP are well known to be biased toward the mean of underlying population distribution (see [36]). While in tests containing many items this bias is relatively small, in tests of fewer items the bias is more apparent.
 
Literatur
1.
Zurück zum Zitat Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates. Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
2.
Zurück zum Zitat Edelen, M. O., Thissen, D., Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2006). Identification of differential item functioning using item response theory and the likelihood-based model comparison approach: Application to the Mini-Mental State Examination [Research Support, N.I.H., Extramural Review]. Medical Care, 44(11 Suppl 3), S134–S142. doi:10.1097/01.mlr.0000245251.83359.8c.CrossRef Edelen, M. O., Thissen, D., Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2006). Identification of differential item functioning using item response theory and the likelihood-based model comparison approach: Application to the Mini-Mental State Examination [Research Support, N.I.H., Extramural Review]. Medical Care, 44(11 Suppl 3), S134–S142. doi:10.​1097/​01.​mlr.​0000245251.​83359.​8c.CrossRef
3.
Zurück zum Zitat Kim, J., Chung, H., Amtmann, D., Revicki, D. A., & Cook, K. F. (2013). Measurement invariance of the PROMIS pain interference item bank across community and clinical samples. [Research Support, N.I.H., Extramural]. Quality of Life Research : An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 22(3), 501–507. doi:10.1007/s11136-012-0191-x.CrossRef Kim, J., Chung, H., Amtmann, D., Revicki, D. A., & Cook, K. F. (2013). Measurement invariance of the PROMIS pain interference item bank across community and clinical samples. [Research Support, N.I.H., Extramural]. Quality of Life Research : An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 22(3), 501–507. doi:10.​1007/​s11136-012-0191-x.CrossRef
4.
Zurück zum Zitat Orlando, M., & Marshall, G. N. (2002). Differential item functioning in a Spanish translation of the PTSD checklist: Detection and evaluation of impact. [Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S.]. Psychological Assessment, 14(1), 50–59.PubMedCrossRef Orlando, M., & Marshall, G. N. (2002). Differential item functioning in a Spanish translation of the PTSD checklist: Detection and evaluation of impact. [Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S.]. Psychological Assessment, 14(1), 50–59.PubMedCrossRef
5.
Zurück zum Zitat Rose, J. S., Lee, C. T., Selya, A. S., & Dierker, L. C. (2012). DSM-IV alcohol abuse and dependence criteria characteristics for recent onset adolescent drinkers. [Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S.]. Drug and Alcohol Dependence, 124(1–2), 88–94. doi:10.1016/j.drugalcdep.2011.12.013.PubMedCentralPubMedCrossRef Rose, J. S., Lee, C. T., Selya, A. S., & Dierker, L. C. (2012). DSM-IV alcohol abuse and dependence criteria characteristics for recent onset adolescent drinkers. [Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S.]. Drug and Alcohol Dependence, 124(1–2), 88–94. doi:10.​1016/​j.​drugalcdep.​2011.​12.​013.PubMedCentralPubMedCrossRef
6.
Zurück zum Zitat Weisscher, N., Glas, C. A., Vermeulen, M., & De Haan, R. J. (2010). The use of an item response theory-based disability item bank across diseases: Accounting for differential item functioning. [Multicenter Study]. Journal of Clinical Epidemiology, 63(5), 543–549. doi:10.1016/j.jclinepi.2009.07.016.PubMedCrossRef Weisscher, N., Glas, C. A., Vermeulen, M., & De Haan, R. J. (2010). The use of an item response theory-based disability item bank across diseases: Accounting for differential item functioning. [Multicenter Study]. Journal of Clinical Epidemiology, 63(5), 543–549. doi:10.​1016/​j.​jclinepi.​2009.​07.​016.PubMedCrossRef
7.
Zurück zum Zitat Carle, A. C., Cella, D., Cai, L., Choi, S. W., Crane, P. K., Curtis, S. M., et al. (2011). Advancing PROMIS’s methodology: Results of the third patient-reported outcomes measurement information system (PROMIS((R))) Psychometric Summit. [Congresses Research Support, N.I.H., Extramural]. Expert Review of Pharmacoeconomics & Outcomes Research, 11(6), 677–684. doi:10.1586/erp.11.74.CrossRef Carle, A. C., Cella, D., Cai, L., Choi, S. W., Crane, P. K., Curtis, S. M., et al. (2011). Advancing PROMIS’s methodology: Results of the third patient-reported outcomes measurement information system (PROMIS((R))) Psychometric Summit. [Congresses Research Support, N.I.H., Extramural]. Expert Review of Pharmacoeconomics & Outcomes Research, 11(6), 677–684. doi:10.​1586/​erp.​11.​74.CrossRef
8.
Zurück zum Zitat Cook, K. F., Bamer, A. M., Amtmann, D., Molton, I. R., & Jensen, M. P. (2012). Six patient-reported outcome measurement information system short form measures have negligible age- or diagnosis-related differential item functioning in individuals with disabilities. [Comparative Study Research Support, U.S. Gov’t, Non-P.H.S.]. Archives of Physical Medicine and Rehabilitation, 93(7), 1289–1291. doi:10.1016/j.apmr.2011.11.022.PubMedCrossRef Cook, K. F., Bamer, A. M., Amtmann, D., Molton, I. R., & Jensen, M. P. (2012). Six patient-reported outcome measurement information system short form measures have negligible age- or diagnosis-related differential item functioning in individuals with disabilities. [Comparative Study Research Support, U.S. Gov’t, Non-P.H.S.]. Archives of Physical Medicine and Rehabilitation, 93(7), 1289–1291. doi:10.​1016/​j.​apmr.​2011.​11.​022.PubMedCrossRef
9.
Zurück zum Zitat DeWitt, E. M., Stucky, B. D., Thissen, D., Irwin, D. E., Langer, M., Varni, J. W., et al. (2011). Construction of the eight-item patient-reported outcomes measurement information system pediatric physical function scales: built using item response theory. [Research Support, N.I.H., Extramural Validation Studies]. Journal of Clinical Epidemiology, 64(7), 794–804. doi:10.1016/j.jclinepi.2010.10.012.PubMedCentralPubMedCrossRef DeWitt, E. M., Stucky, B. D., Thissen, D., Irwin, D. E., Langer, M., Varni, J. W., et al. (2011). Construction of the eight-item patient-reported outcomes measurement information system pediatric physical function scales: built using item response theory. [Research Support, N.I.H., Extramural Validation Studies]. Journal of Clinical Epidemiology, 64(7), 794–804. doi:10.​1016/​j.​jclinepi.​2010.​10.​012.PubMedCentralPubMedCrossRef
10.
Zurück zum Zitat Hahn, E. A., Devellis, R. F., Bode, R. K., Garcia, S. F., Castel, L. D., Eisen, S. V., et al. (2010). Measuring social health in the patient-reported outcomes measurement information system (PROMIS): item bank development and testing. [Research Support, N.I.H., Extramural Validation Studies]. Quality of Life Research : An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 19(7), 1035–1044. doi:10.1007/s11136-010-9654-0.CrossRef Hahn, E. A., Devellis, R. F., Bode, R. K., Garcia, S. F., Castel, L. D., Eisen, S. V., et al. (2010). Measuring social health in the patient-reported outcomes measurement information system (PROMIS): item bank development and testing. [Research Support, N.I.H., Extramural Validation Studies]. Quality of Life Research : An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 19(7), 1035–1044. doi:10.​1007/​s11136-010-9654-0.CrossRef
11.
Zurück zum Zitat Petersen, M. A., Giesinger, J. M., Holzner, B., Arraras, J. I., Conroy, T., Gamper, E. M., et al. (2013). Psychometric evaluation of the EORTC computerized adaptive test (CAT) fatigue item pool. Quality of Life Research : An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation,. doi:10.1007/s11136-013-0372-2. Petersen, M. A., Giesinger, J. M., Holzner, B., Arraras, J. I., Conroy, T., Gamper, E. M., et al. (2013). Psychometric evaluation of the EORTC computerized adaptive test (CAT) fatigue item pool. Quality of Life Research : An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation,. doi:10.​1007/​s11136-013-0372-2.
12.
Zurück zum Zitat Smith, A. B., Armes, J., Richardson, A., & Stark, D. P. (2013). Psychological distress in cancer survivors: the further development of an item bank. Psycho-oncology, 22(2), 308–314. doi:10.1002/pon.2090.PubMed Smith, A. B., Armes, J., Richardson, A., & Stark, D. P. (2013). Psychological distress in cancer survivors: the further development of an item bank. Psycho-oncology, 22(2), 308–314. doi:10.​1002/​pon.​2090.PubMed
13.
Zurück zum Zitat Varni, J. W., Stucky, B. D., Thissen, D., Dewitt, E. M., Irwin, D. E., Lai, J. S., et al. (2010). PROMIS Pediatric Pain Interference Scale: An item response theory analysis of the pediatric pain item bank [Research Support, N.I.H., Extramural]. The Journal of Pain: Official Journal of the American Pain Society, 11(11), 1109–1119. doi:10.1016/j.jpain.2010.02.005.CrossRef Varni, J. W., Stucky, B. D., Thissen, D., Dewitt, E. M., Irwin, D. E., Lai, J. S., et al. (2010). PROMIS Pediatric Pain Interference Scale: An item response theory analysis of the pediatric pain item bank [Research Support, N.I.H., Extramural]. The Journal of Pain: Official Journal of the American Pain Society, 11(11), 1109–1119. doi:10.​1016/​j.​jpain.​2010.​02.​005.CrossRef
14.
Zurück zum Zitat Yeatts, K. B., Stucky, B., Thissen, D., Irwin, D., Varni, J. W., DeWitt, E. M., et al. (2010). Construction of the Pediatric Asthma Impact Scale (PAIS) for the Patient-Reported Outcomes Measurement Information System (PROMIS). The Journal of Asthma : Official Journal of the Association for the Care of Asthma, 47(3), 295–302. doi:10.3109/02770900903426997.CrossRef Yeatts, K. B., Stucky, B., Thissen, D., Irwin, D., Varni, J. W., DeWitt, E. M., et al. (2010). Construction of the Pediatric Asthma Impact Scale (PAIS) for the Patient-Reported Outcomes Measurement Information System (PROMIS). The Journal of Asthma : Official Journal of the Association for the Care of Asthma, 47(3), 295–302. doi:10.​3109/​0277090090342699​7.CrossRef
16.
Zurück zum Zitat Miller, T. R., & Spray, J. A. (1993). Logistic discriminant function analysis for dif identification of polytomously scores items. Journal of Educational Measurement, 30(2), 107–122.CrossRef Miller, T. R., & Spray, J. A. (1993). Logistic discriminant function analysis for dif identification of polytomously scores items. Journal of Educational Measurement, 30(2), 107–122.CrossRef
17.
Zurück zum Zitat Swaminathan, H., & Rogers, H. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRef Swaminathan, H., & Rogers, H. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRef
18.
Zurück zum Zitat Meade, A. W., & Wright, N. A. (2012). Solving the measurement invariance anchor item problem in item response theory. The Journal of applied psychology, 97(5), 1016–1031. doi:10.1037/a0027934.PubMedCrossRef Meade, A. W., & Wright, N. A. (2012). Solving the measurement invariance anchor item problem in item response theory. The Journal of applied psychology, 97(5), 1016–1031. doi:10.​1037/​a0027934.PubMedCrossRef
20.
Zurück zum Zitat Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates. Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
21.
Zurück zum Zitat Cai, L., du Toit, S., & Thissen, D. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling [Computer software] Chicago. IL: Scientific Software International. Inc. Cai, L., du Toit, S., & Thissen, D. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling [Computer software] Chicago. IL: Scientific Software International. Inc.
22.
Zurück zum Zitat Langer, M. M. (2008). A reexamination of Lord’s Wald test for differential item functioning using item response theory and modern error estimation. Chapel Hill: The University of North Carolina. Langer, M. M. (2008). A reexamination of Lord’s Wald test for differential item functioning using item response theory and modern error estimation. Chapel Hill: The University of North Carolina.
23.
Zurück zum Zitat Lord, F. M. (1980). Applications of item response theory to practical testing problems. London: Routledge. Lord, F. M. (1980). Applications of item response theory to practical testing problems. London: Routledge.
25.
Zurück zum Zitat Meng, X. L., & Rubin, D. B. (1991). Using EM to obtain asymptotic variance: covariance matrices—the SEM algorithm. Journal of the American Statistical Association, 86(416), 899–909. doi:10.2307/2290503.CrossRef Meng, X. L., & Rubin, D. B. (1991). Using EM to obtain asymptotic variance: covariance matrices—the SEM algorithm. Journal of the American Statistical Association, 86(416), 899–909. doi:10.​2307/​2290503.CrossRef
26.
Zurück zum Zitat Irwin, D. E., Stucky, B., Langer, M. M., Thissen, D., DeWitt, E. M., Lai, J.-S., et al. (2010). An item response analysis of the pediatric PROMIS anxiety and depressive symptoms scales. Quality of Life Research, 19(4), 595–607.PubMedCentralPubMedCrossRef Irwin, D. E., Stucky, B., Langer, M. M., Thissen, D., DeWitt, E. M., Lai, J.-S., et al. (2010). An item response analysis of the pediatric PROMIS anxiety and depressive symptoms scales. Quality of Life Research, 19(4), 595–607.PubMedCentralPubMedCrossRef
27.
Zurück zum Zitat Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300.
28.
Zurück zum Zitat Thissen, D., Steinberg, L., & Kuang, D. (2002). Quick and easy implementation of the Benjamini-Hochberg procedure for controlling the false positive rate in multiple comparisons. Journal of Educational and Behavioral Statistics, 27(1), 77–83.CrossRef Thissen, D., Steinberg, L., & Kuang, D. (2002). Quick and easy implementation of the Benjamini-Hochberg procedure for controlling the false positive rate in multiple comparisons. Journal of Educational and Behavioral Statistics, 27(1), 77–83.CrossRef
29.
Zurück zum Zitat Steinberg, L., & Thissen, D. (2006). Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning. Psychological Methods, 11(4), 402–415.PubMedCrossRef Steinberg, L., & Thissen, D. (2006). Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning. Psychological Methods, 11(4), 402–415.PubMedCrossRef
30.
Zurück zum Zitat Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19(4), 353–368.CrossRef Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19(4), 353–368.CrossRef
31.
Zurück zum Zitat Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York: Springer.CrossRef Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York: Springer.CrossRef
33.
Zurück zum Zitat Oshima, T., Kushubar, S., Scott, J., & Raju, N. (2009). DFIT8 for window user’s manual: Differential functioning of items and tests. St. Paul, MN: Assessment Systems Corporation. Oshima, T., Kushubar, S., Scott, J., & Raju, N. (2009). DFIT8 for window user’s manual: Differential functioning of items and tests. St. Paul, MN: Assessment Systems Corporation.
34.
Zurück zum Zitat Chandra, A., Edelen, M., Orr, P., Stucky, B., & Schaefer, J. (2013). Developing a global cancer stigma index. Santa Monica, CA: RAND Corporation. Chandra, A., Edelen, M., Orr, P., Stucky, B., & Schaefer, J. (2013). Developing a global cancer stigma index. Santa Monica, CA: RAND Corporation.
35.
Zurück zum Zitat Muthén, L., & Muthén, B. (1998–2010). Mplus User’s Guide. Los Angeles, CA: Muthen & Muthen. Muthén, L., & Muthén, B. (1998–2010). Mplus User’s Guide. Los Angeles, CA: Muthen & Muthen.
36.
Zurück zum Zitat Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444.CrossRef Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444.CrossRef
Metadaten
Titel
Quantifying ‘problematic’ DIF within an IRT framework: application to a cancer stigma index
verfasst von
Maria Orlando Edelen
Brian D. Stucky
Anita Chandra
Publikationsdatum
01.01.2015
Verlag
Springer International Publishing
Erschienen in
Quality of Life Research / Ausgabe 1/2015
Print ISSN: 0962-9343
Elektronische ISSN: 1573-2649
DOI
https://doi.org/10.1007/s11136-013-0540-4

Weitere Artikel der Ausgabe 1/2015

Quality of Life Research 1/2015 Zur Ausgabe