Skip to main content
Erschienen in: Quality of Life Research 1/2007

01.08.2007 | Original Paper

Evaluating measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning (DIF): applications (with illustrations) to measures of physical functioning ability and general distress

verfasst von: Jeanne A. Teresi, Katja Ocepek-Welikson, Marjorie Kleinman, Karon F. Cook, Paul K. Crane, Laura E. Gibbons, Leo S. Morales, Maria Orlando-Edelen, David Cella

Erschienen in: Quality of Life Research | Sonderheft 1/2007

Einloggen, um Zugang zu erhalten

Abstract

Background Methods based on item response theory (IRT) that can be used to examine differential item functioning (DIF) are illustrated. An IRT-based approach to the detection of DIF was applied to physical function and general distress item sets. DIF was examined with respect to gender, age and race. The method used for DIF detection was the item response theory log-likelihood ratio (IRTLR) approach. DIF magnitude was measured using the differences in the expected item scores, expressed as the unsigned probability differences, and calculated using the non-compensatory DIF index (NCDIF). Finally, impact was assessed using expected scale scores, expressed as group differences in the total test (measure) response functions. Methods The example for the illustration of the methods came from a study of 1,714 patients with cancer or HIV/AIDS. The measure contained 23 items measuring physical functioning ability and 15 items addressing general distress, scored in the positive direction. Results The substantive findings were of relatively small magnitude DIF. In total, six items showed relatively larger magnitude (expected item score differences greater than the cutoff) of DIF with respect to physical function across the three comparisons: “trouble with a long walk” (race), “vigorous activities” (race, age), “bending, kneeling stooping” (age), “lifting or carrying groceries” (race), “limited in hobbies, leisure” (age), “lack of energy” (race). None of the general distress items evidenced high magnitude DIF; although “worrying about dying” showed some DIF with respect to both age and race, after adjustment. Conclusions The fact that many physical function items showed DIF with respect to age, even after adjustment for multiple comparisons, indicates that the instrument may be performing differently for these groups. While the magnitude and impact of DIF at the item and scale level was minimal, caution should be exercised in the use of subsets of these items, as might occur with selection for clinical decisions or computerized adaptive testing. The issues of selection of anchor items, and of criteria for DIF detection, including the integration of significance and magnitude measures remain as issues requiring investigation. Further research is needed regarding the criteria and guidelines appropriate for DIF detection in the context of health-related items.
Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Crane, P. K., Gibbons, L. E., Ocepek-Welikson, K., Cook, K., Cella, D., Narasimhalu, K., Hays, R., & Teresi, J. (2007). A comparison of two sets of criteria for determining the presence of differential item functioning using ordinal logistic regression (this issue). Crane, P. K., Gibbons, L. E., Ocepek-Welikson, K., Cook, K., Cella, D., Narasimhalu, K., Hays, R., & Teresi, J. (2007). A comparison of two sets of criteria for determining the presence of differential item functioning using ordinal logistic regression (this issue).
2.
Zurück zum Zitat Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRef Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRef
3.
Zurück zum Zitat Zumbo, B. D. (1999) A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type(ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. Retrieved from http://www.educ.ubc.ca/faculty/zumbo/DIF/index.html. Zumbo, B. D. (1999) A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type(ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. Retrieved from http://​www.​educ.​ubc.​ca/​faculty/​zumbo/​DIF/​index.​html.​
4.
Zurück zum Zitat Crane, P. K., van Belle, G., & Larson, E. B. (2004). Test bias in a cognitive test: Differential item functioning in the CASI. Statistics in Medicine, 23, 241–256.PubMedCrossRef Crane, P. K., van Belle, G., & Larson, E. B. (2004). Test bias in a cognitive test: Differential item functioning in the CASI. Statistics in Medicine, 23, 241–256.PubMedCrossRef
5.
Zurück zum Zitat Teresi, J. A., Stewart, A. L., Morales, L., & Stahl, S. (2006). Measurement in a multi-ethnic society. Special Issue of Medical Care, 44(Suppl. 3), S1–S210. Teresi, J. A., Stewart, A. L., Morales, L., & Stahl, S. (2006). Measurement in a multi-ethnic society. Special Issue of Medical Care, 44(Suppl. 3), S1–S210.
6.
Zurück zum Zitat Teresi, J. A. (2006). Different approaches to differential item functioning in health applications: Advantages, disadvantages and some neglected topics. Medical Care, 44, S152–S170.PubMedCrossRef Teresi, J. A. (2006). Different approaches to differential item functioning in health applications: Advantages, disadvantages and some neglected topics. Medical Care, 44, S152–S170.PubMedCrossRef
7.
Zurück zum Zitat Cole, S. R., Kawachi, I., Maller, S. J., Munoz, R. F., & Berkman, L. F. (2000). Test of item-response bias in the CES-D scale: Experiences from the New Haven EPESE study. Journal of Clinical Epidemiology, 53, 285–289.PubMedCrossRef Cole, S. R., Kawachi, I., Maller, S. J., Munoz, R. F., & Berkman, L. F. (2000). Test of item-response bias in the CES-D scale: Experiences from the New Haven EPESE study. Journal of Clinical Epidemiology, 53, 285–289.PubMedCrossRef
8.
Zurück zum Zitat Gallo, J. J., Cooper-Patrick, L., & Lesikar, S. (1998). Depressive symptoms of Whites and African-Americans aged 60 years and older. Journal of Gerontology, 53B, 277–285. Gallo, J. J., Cooper-Patrick, L., & Lesikar, S. (1998). Depressive symptoms of Whites and African-Americans aged 60 years and older. Journal of Gerontology, 53B, 277–285.
9.
Zurück zum Zitat Mui, A. C., Burnette, D., & Chen, L. M. (2001). Cross-cultural assessment of geriatric depression: A review of the CES-D and GDS. Journal of Mental Health and Aging, 7, 137–164. Mui, A. C., Burnette, D., & Chen, L. M. (2001). Cross-cultural assessment of geriatric depression: A review of the CES-D and GDS. Journal of Mental Health and Aging, 7, 137–164.
10.
Zurück zum Zitat Fleishman, J. A., & Lawrence, W. F. (2003). Demographic variation in SF-12 scores: True differences or differential item functioning? Medical Care, 41(Suppl), 75–86.CrossRef Fleishman, J. A., & Lawrence, W. F. (2003). Demographic variation in SF-12 scores: True differences or differential item functioning? Medical Care, 41(Suppl), 75–86.CrossRef
11.
Zurück zum Zitat Gelin, M. N., Carleton, B. C., Smith, M. A., & Zumbo, B. D. (2004). The dimensionality and gender differential item functioning of the Mini-Asthma Quality-of-Life Questionnaire (MINIAQLQ). Social Indicators Research Dordrecht, 68, 81. Gelin, M. N., Carleton, B. C., Smith, M. A., & Zumbo, B. D. (2004). The dimensionality and gender differential item functioning of the Mini-Asthma Quality-of-Life Questionnaire (MINIAQLQ). Social Indicators Research Dordrecht, 68, 81.
12.
Zurück zum Zitat Roorda, L. D., Roebroeck, M. E., van Tilburg, T., Lankhorst, G. J., Bouter L. M., Measuring Mobility Study Group (2004). Measuring activity limitations in climbing stairs: Development of a hierarchical scale for patients with lower-extremity disorders living at home. Archives of Physical Medicine and Rehabilitation, 85, 967–971. Roorda, L. D., Roebroeck, M. E., van Tilburg, T., Lankhorst, G. J., Bouter L. M., Measuring Mobility Study Group (2004). Measuring activity limitations in climbing stairs: Development of a hierarchical scale for patients with lower-extremity disorders living at home. Archives of Physical Medicine and Rehabilitation, 85, 967–971.
13.
Zurück zum Zitat Thissen, D. (1991). MULTILOG TM User’s Guide. Multiple, categorical item analysis and test scoring using item response theory. Chicago: Scientific Software Inc. Thissen, D. (1991). MULTILOG TM User’s Guide. Multiple, categorical item analysis and test scoring using item response theory. Chicago: Scientific Software Inc.
14.
Zurück zum Zitat Thissen, D. (2001). IRTLRDIF v2.0b; Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Available on Dave Thissen’s web page. Thissen, D. (2001). IRTLRDIF v2.0b; Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Available on Dave Thissen’s web page.
15.
Zurück zum Zitat Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications: Thousand Oaks, California. Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications: Thousand Oaks, California.
16.
Zurück zum Zitat Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353–368.CrossRef Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353–368.CrossRef
17.
Zurück zum Zitat Flowers, C. P., Oshima, T. C., & Raju, N. S. (1999). A description and demonstration of the polytomous DFITframework. Applied Psychological Measurement, 23, 309–326.CrossRef Flowers, C. P., Oshima, T. C., & Raju, N. S. (1999). A description and demonstration of the polytomous DFITframework. Applied Psychological Measurement, 23, 309–326.CrossRef
18.
Zurück zum Zitat Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Lawrence Erlbaum Associates: Hillsdale, NJ. Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Lawrence Erlbaum Associates: Hillsdale, NJ.
19.
Zurück zum Zitat Dorans, N. J., & Kulick, E. (2006). Differential item functioning on the MMSE: An application of the Mantel Haenszel and Standardization procedures. Medical Care, 44 (Suppl. 3), S107–S114.CrossRef Dorans, N. J., & Kulick, E. (2006). Differential item functioning on the MMSE: An application of the Mantel Haenszel and Standardization procedures. Medical Care, 44 (Suppl. 3), S107–S114.CrossRef
20.
Zurück zum Zitat Simpson, E. H. (1951). The interpretation of interaction contingency tables. Journal of the Royal Statistical Society (Series B), 13, 238–241. Simpson, E. H. (1951). The interpretation of interaction contingency tables. Journal of the Royal Statistical Society (Series B), 13, 238–241.
21.
Zurück zum Zitat Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.CrossRef Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.CrossRef
22.
Zurück zum Zitat Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications Inc. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications Inc.
23.
Zurück zum Zitat Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, New Jersey: Lawrence Erlbaum. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, New Jersey: Lawrence Erlbaum.
24.
Zurück zum Zitat Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, Massachusetts: Addison-Wesley Publishing Co. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, Massachusetts: Addison-Wesley Publishing Co.
25.
Zurück zum Zitat Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement 17: Richmond, VA: William Byrd Press. Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement 17: Richmond, VA: William Byrd Press.
26.
Zurück zum Zitat Bonferroni, C. E. (1936). Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8, 3–62. Bonferroni, C. E. (1936). Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8, 3–62.
27.
Zurück zum Zitat Williams, V. S. L., Jones, L. V., & Tukey, J. W. (1999). Controlling error in multiple comparisons, with examples from state-to-state differences in educational achievement. Journal of Educational and Behavioral Statistics, 24, 42–69. Williams, V. S. L., Jones, L. V., & Tukey, J. W. (1999). Controlling error in multiple comparisons, with examples from state-to-state differences in educational achievement. Journal of Educational and Behavioral Statistics, 24, 42–69.
28.
Zurück zum Zitat Benjamini, Y., & Hochberg, Y. (1995). Controlling for the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289–300. Benjamini, Y., & Hochberg, Y. (1995). Controlling for the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289–300.
29.
Zurück zum Zitat Steinberg, L. (2001). The consequences of pairing questions: Context effects in personality measurement. Journal of Personality and Social Psychology, 81, 332–342.PubMedCrossRef Steinberg, L. (2001). The consequences of pairing questions: Context effects in personality measurement. Journal of Personality and Social Psychology, 81, 332–342.PubMedCrossRef
30.
Zurück zum Zitat Thissen, D., Steinberg, L., & Kuang, D. (2002). Quick and easy implementation of the Benjamini-Hochberg procedure for controlling the false discovery rate in multiple comparisons. Journal of Educational and Behavioral Statistics, 27, 77–83.CrossRef Thissen, D., Steinberg, L., & Kuang, D. (2002). Quick and easy implementation of the Benjamini-Hochberg procedure for controlling the false discovery rate in multiple comparisons. Journal of Educational and Behavioral Statistics, 27, 77–83.CrossRef
31.
Zurück zum Zitat Orlando-Edelen, M., Thissen, D., Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2006). Identification of differential item functioning using item response theory and the likelihood-based model comparison approach: Application to the Mini-mental status examination. Medical Care, 44, S134–S142.PubMedCrossRef Orlando-Edelen, M., Thissen, D., Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2006). Identification of differential item functioning using item response theory and the likelihood-based model comparison approach: Application to the Mini-mental status examination. Medical Care, 44, S134–S142.PubMedCrossRef
32.
Zurück zum Zitat Wang, W. C., Yeh, Y. L., & Yi, C. (2003). Effects of anchor item methods on differential item functioning detection with likelihood ratio test. Applied Psychological Measurement, 27, 479–498.CrossRef Wang, W. C., Yeh, Y. L., & Yi, C. (2003). Effects of anchor item methods on differential item functioning detection with likelihood ratio test. Applied Psychological Measurement, 27, 479–498.CrossRef
33.
Zurück zum Zitat Orlando, M., & Marshall, G. N. (2002). Differential item functioning in a Spanish Translation of the PTSD Checklist: Detection and evaluation of impact. Psychological Assessment, 14, 50–59.PubMedCrossRef Orlando, M., & Marshall, G. N. (2002). Differential item functioning in a Spanish Translation of the PTSD Checklist: Detection and evaluation of impact. Psychological Assessment, 14, 50–59.PubMedCrossRef
34.
Zurück zum Zitat Teresi, J., Kleinman, M., & Ocepek-Welikson, K. (2000). Modern psychometric methods for detection of differential item functioning: Application to cognitive assessment measures. Statistics in Medicine, 19, 1651–1683.PubMedCrossRef Teresi, J., Kleinman, M., & Ocepek-Welikson, K. (2000). Modern psychometric methods for detection of differential item functioning: Application to cognitive assessment measures. Statistics in Medicine, 19, 1651–1683.PubMedCrossRef
35.
Zurück zum Zitat Chang, H. -H., & Mazzeo, J. (1994). The unique correspondence of the item response function and item category response functions in polytomously scored item response models. Psychometrika, 39, 391–404.CrossRef Chang, H. -H., & Mazzeo, J. (1994). The unique correspondence of the item response function and item category response functions in polytomously scored item response models. Psychometrika, 39, 391–404.CrossRef
36.
Zurück zum Zitat Fleer, P. F. (1993). A Monte Carlo assessment of a new measure of item and test bias. [dissertation] Illinois Institute of Technology. Dissertation Abstracts International 54-04B, 2266. Fleer, P. F. (1993). A Monte Carlo assessment of a new measure of item and test bias. [dissertation] Illinois Institute of Technology. Dissertation Abstracts International 54-04B, 2266.
37.
Zurück zum Zitat Flowers, C. P., Oshima, T. C., & Raju, N. S. (1995). A Monte Carlo assessment of DFIT with dichotomously-scored unidimensional tests. [dissertation] Atlanta, GA: Georgia State University. Flowers, C. P., Oshima, T. C., & Raju, N. S. (1995). A Monte Carlo assessment of DFIT with dichotomously-scored unidimensional tests. [dissertation] Atlanta, GA: Georgia State University.
38.
Zurück zum Zitat Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 123–135). Hillsdale, NJ: Lawrence Erlbaum Inc. Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 123–135). Hillsdale, NJ: Lawrence Erlbaum Inc.
39.
Zurück zum Zitat Raju, N. S. (1999). DFITP5: A Fortran program for calculating dichotomous DIF/DTF [computer program]. Chicago: Illinois Institute of Technology. Raju, N. S. (1999). DFITP5: A Fortran program for calculating dichotomous DIF/DTF [computer program]. Chicago: Illinois Institute of Technology.
40.
Zurück zum Zitat Collins, W. C., Raju, N. S., & Edwards, J. E. (2000). Assessing differential item functioning in a satisfaction scale. Journal of Applied Psychology, 85, 451–461.PubMedCrossRef Collins, W. C., Raju, N. S., & Edwards, J. E. (2000). Assessing differential item functioning in a satisfaction scale. Journal of Applied Psychology, 85, 451–461.PubMedCrossRef
41.
Zurück zum Zitat Morales, L. S., Flowers, C., Gutiérrez, P., Kleinman, M., & Teresi, J. A. (2006). Item and scale differential functioning of the Mini-Mental Status Exam assessed using the DFIT methodology. Medical Care, 44, S143–S151.PubMedCrossRef Morales, L. S., Flowers, C., Gutiérrez, P., Kleinman, M., & Teresi, J. A. (2006). Item and scale differential functioning of the Mini-Mental Status Exam assessed using the DFIT methodology. Medical Care, 44, S143–S151.PubMedCrossRef
42.
Zurück zum Zitat Baker, F. B. (1995). EQUATE 2.1: Computer program for equating two metrics in item response theory [Computer program]. Madison: University of Wisconsin, Laboratory of Experimental Design. Baker, F. B. (1995). EQUATE 2.1: Computer program for equating two metrics in item response theory [Computer program]. Madison: University of Wisconsin, Laboratory of Experimental Design.
43.
Zurück zum Zitat Cohen, A. S., Kim, S.-H., & Baker, F. B. (1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17, 335–350. Cohen, A. S., Kim, S.-H., & Baker, F. B. (1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17, 335–350.
44.
Zurück zum Zitat Oshima, T.C., Raju, N.S., Nanda, A.O. (2006). A new method for assessing the statistical significance in the differential functioning of items and tests (DFIT) framework. Journal of Educational Measurement, 43, 1–17. Oshima, T.C., Raju, N.S., Nanda, A.O. (2006). A new method for assessing the statistical significance in the differential functioning of items and tests (DFIT) framework. Journal of Educational Measurement, 43, 1–17.
Metadaten
Titel
Evaluating measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning (DIF): applications (with illustrations) to measures of physical functioning ability and general distress
verfasst von
Jeanne A. Teresi
Katja Ocepek-Welikson
Marjorie Kleinman
Karon F. Cook
Paul K. Crane
Laura E. Gibbons
Leo S. Morales
Maria Orlando-Edelen
David Cella
Publikationsdatum
01.08.2007
Verlag
Springer Netherlands
Erschienen in
Quality of Life Research / Ausgabe Sonderheft 1/2007
Print ISSN: 0962-9343
Elektronische ISSN: 1573-2649
DOI
https://doi.org/10.1007/s11136-007-9186-4

Weitere Artikel der Sonderheft 1/2007

Quality of Life Research 1/2007 Zur Ausgabe