Abstract
A real-data simulation of computerized adaptive testing (CAT) is an important step in real-life CAT applications. Such a simulation allows CAT developers to evaluate important features of the CAT system, such as item selection and stopping rules, before live testing. SIMPOLYCAT, an SAS macro program, was created by the authors to conduct real-data CAT simulations based on polytomous item response theory (IRT) models. In SIMPOLYCAT, item responses can be input from an external file or generated internally on the basis of item parameters provided by users. The program allows users to choose among methods of setting initial ?, approaches to item selection, trait estimators, CAT stopping criteria, polytomous IRT models, and other CAT parameters. In addition, CAT simulation results can be saved easily and used for further study. The purpose of this article is to introduce SIMPOLYCAT, briefly describe the program algorithm and parameters, and provide examples of CAT simulations, using generated and real data. Visual comparisons of the results obtained from the CAT simulations are presented.
Article PDF
Similar content being viewed by others
References
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573.
Cella, D., Gershon, R., Lai, J. S., & Choi, S. (2007). The future of outcomes measurement: Item banking, tailored short-forms, and computerized adaptive assessment. Quality of Life Research, 16(Suppl.), 133–141.
Cella, D., Yount, S., Rothrock, N., Gershon, R., Cook, K., Reeve, B., et al. (2007). The Patient-Reported Outcomes Measurement Information System (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care, 45(Suppl.), S3-S11.
Chen, S.-K. (1997). A comparison of maximum likelihood estimation and expected a posteriori estimation in computerized adaptive testing using the generalized partial credit model (Doctoral dissertation, University of Texas at Austin, 1997). Dissertation Abstracts International, 58, 453.
Chen, S.-K. (2007). The comparison of maximum likelihood estimation and expected a posteriori in CAT using the graded response model. Journal of Elementary Education, 19, 339–371.
Chen, S.-K., Hou, L., & Dodd, B. G. (1998). A comparison of maximum likelihood estimation and expected a posteriori estimation in CAT using the partial credit model. Educational & Psychological Measurement, 58, 569–595.
Chen, S.-K., Hou, L., Fitzpatrick, S. J., & Dodd, B. G. (1997). The effect of population distribution and methods of theta estimation on CAT using the rating scale model. Educational & Psychological Measurement, 57, 422–439.
Cook, K. F., Crane, P., & Amtmann, D. (2006, September). Simulated CAT using the Modified Rolland—Morris Back Disability Questionnaire. Paper presented at the NIH Patient Reported Outcome Measurement Information System national meeting, Bethesda, MD.
Cook, K. F., Teal, C. R., Bjorner, J. B., Cella, D., Chang, C.-H., Crane, P. K., et al. (2007). IRT health outcomes data analysis project: An overview and summary. Quality of Life Research, 16(Suppl.), 121–132.
Davis, K. M., Lai, J.-S., Hahn, E. A., & Cella, D. (2008). Conducting routine fatigue assessments for use in clinical oncology practice: Patient and provider perspectives. Supportive Care in Cancer, 16, 379–386.
Davis, L. L., & Dodd, B. G. (2008). Strategies for controlling item exposure in computerized adaptive testing with the partial credit model. Journal of Applied Measurement, 9, 1–17.
Davis, L. L., Pastor, D. A., Dodd, B. G., Chiang, C., & Fitzpatrick, S. J. (2003). An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model. Journal of Applied Measurement, 4, 24–42.
DeWalt, D. A., Rothrock, N., Yount, S., & Stone, A. A. (2007). Evaluation of item candidates: The PROMIS qualitative item review. Medical Care, 45(Suppl.), 12–21.
Dodd, B. G., DeAyala, R. J., & Koch, W. R. (1995). Computerized adaptive testing with polytomous items. Applied Psychological Measurement, 19, 5–22.
Dodd, B. G., Koch, W. R., & DeAyala, R. J. (1989). Operational characteristics of adaptive testing procedure using the graded response model. Applied Psychological Measurement, 13, 129–143.
Gorin, J. S., Dodd, B. G., Fitzpatrick, S. J., & Shieh, Y. Y. (2005). Computerized adaptive testing with the partial credit model: Estimation procedures, population distributions, and item pool characteristics. Applied Psychological Measurement, 29, 433–456.
Hol, A. M., Vorst, H. C. M., & Mellenbergh, G. J. (2007). Computerized adaptive testing for polytomous motivation items: Administration mode effects and a comparison with short forms. Applied Psychological Measurement, 31, 412–429.
MacDonald, P. L. (2003). Computer-adaptive test for measuring personality factors using item response theory. Dissertation Abstracts International, 64(2-B), 999. (University Microfilms No. AAINQ77103)
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Mulcahey, M. J., Haley, S. M., Duffy, T., Pengsheng, N., & Betz, R. R. (2008). Measuring physical functioning in children with spinal impairments with computerized adaptive testing. Journal of Pediatric Orthopaedics, 28, 330–335.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.
Muraki, E., & Bock, R. D. (2003). Parscale for Windows (Version 4.0) [Computer software]. Lincolnwood, IL: Scientific Software International.
Pastor, D. A., Dodd, B. G., & Chang, H. H. (2002). A comparison of item selection techniques and exposure control mechanisms in CATs using the generalized partial credit model. Applied Psychological Measurement, 26, 147–163.
Penfield, R. D. (2006). Applied Bayesian item selection approaches to adaptive tests using polytomous items. Applied Measurement in Education, 19, 1–20.
Penfield, R. D., & Bergeron, J. M. (2005). Applying a weighted maximum likelihood latent trait estimator to the generalized partial credit model. Applied Psychological Measurement, 29, 218–233.
Reeve, B. B., Burke, L. B., Chiang, Y.-P., Clauser, S. B., Colpe, L. J., Elias, J. W., et al. (2007). Enhancing measurement in health outcomes research supported by agencies within the US Department of Health and Human Services. Quality of Life Research, 16(Suppl.), 175–186.
Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical Care, 45(Suppl.), 22–31.
Rose, M., Bjorner, J. B., Becker, J., Fries, J. F., & Ware, J. E. (2008). Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). Journal of Clinical Epidemiology, 61, 17–33.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100–114.
Thissen, D. (2003). Multilog for Windows (Version 7.0) [Computer software]. Lincolnwood, IL: Scientific Software International.
Thissen, D., Reeve, B. B., Bjorner, J. B., & Chang, C.-H. (2007). Methodological issues for building item banks and computerized adaptive scales. Quality of Life Research, 16(Suppl.), 109–119.
Walter, O. B., Becker, J., Bjorner, J. B., Fliege, H., Klapp, B. F., & Rose, M. (2007). Development and evaluation of a computer adaptive test for “anxiety” (Anxiety-CAT). Quality of Life Research, 16(Suppl.), 143–155.
Wang, S., & Wang, T. (2001). Precision of Warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25, 317–331.
Wang, S., & Wang, T. (2002). Relative precision of ability estimation in polytomous CAT: A comparison under the generalized partial credit model and graded response model. San Antonio, TX: Harcourt Educational Measurement. (ERIC Document Reproduction Service No. ED477926)
Ware, J. E., Jr., Kosinski, M., Bjorner, J. B., Bayliss, M. S., Batenhorst, A., Dahlöf, C. G., et al. (2003). Applications of computerized adaptive testing (CAT) to the assessment of headache impact. Quality of Life Research, 12, 935–952.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427–450.
Weiss, D. J. (2005). Manual for POSTSIM: Post hoc simulation of computerized adaptive testing (Version 2.0). St. Paul, MN: Assessment Systems Corporation.
Author information
Authors and Affiliations
Corresponding author
Additional information
The Patient-Reported Outcomes Measurement Information System (PROMIS) is a National Institutes of Health (NIH) Roadmap initiative to develop a computerized system measuring patient-reported outcomes in respondents with a wide range of chronic diseases and demographic characteristics. PROMIS was funded by cooperative agreements to a statistical coordinating center (Evanston Northwestern Healthcare; PI, David Cella; U01AR52177) and six primary research sites: Duke University (PI, Kevin Weinfurt; U01AR52186), University of North Carolina (PI, Darren DeWalt; U01AR52181), University of Pittsburgh (PI, Paul A. Pilkonis; U01AR52155), Stanford University (PI, James Fries; U01AR52158), Stony Brook University (PI, Arthur Stone; U01AR52170), and University of Washington (PI, Dagmar Amtmann, U01AR52171). NIH Science Officers on this project are Deborah Ader, Susan Czajkowski, Lawrence Fine, Louis Quatrano, Bryce Reeve, William Riley, and Susana Serrate-Sztein. Development of SIMPOLYCAT was supported partially by a grant from the NIH (U01 AR 052177-01; PI, David Cella). The manuscript was reviewed by the PROMIS Publications Subcommittee prior to external peer review. See the Web site at www.nihpromis.org for additional information on the PROMIS cooperative group.
Rights and permissions
About this article
Cite this article
Chen, SK., Cook, K.F. SIMPOLYCAT: An SAS program for conducting CAT simulation based on polytomous IRT models. Behavior Research Methods 41, 499–506 (2009). https://doi.org/10.3758/BRM.41.2.499
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BRM.41.2.499