Abstract
Construct-irrelevant variance (CIV) - the erroneous inflation or deflation of test scores due to certain types of uncontrolled or systematic measurement error - and construct underrepresentation (CUR) - the under-sampling of the achievement domain - are discussed as threats to the meaningful interpretation of scores from objective tests developed for local medical education use. Several sources of CIV and CUR are discussed and remedies are suggested. Test score inflation or deflation, due to the systematic measurement error introduced by CIV, may result from poorly crafted test questions, insecure test questions and other types of test irregularities, testwiseness, guessing, and test item bias. Using indefensible passing standards can interact with test scores to produce CIV. Sources of content under representation are associated with tests that are too short to support legitimate inferences to the domain and which are composed of trivial questions written at low-levels of the cognitive domain. ``Teaching to the test'' is another frequent contributor to CUR in examinations used in medical education. Most sources of CIV and CUR can be controlled or eliminated from the tests used at all levels of medical education, given proper training and support of the faculty who create these important examinations.
Similar content being viewed by others
References
American Educational Research Association, American Psychological Association, National Council on Measurement in Education (1999). Standards for Educational and Psychological Testing. Washington: American Educational Research Association.
Anastasi, A. (1988). Psychological Testing. New York: Macmillan.
Case, S.M. & Swanson, D.E. (1998). Constructing Written Test Questions for the Basic And Clinical Sciences. <http://www.nbme.org/nbme/itemwriting.htm>. Accessed 3/28/02 National Board of Medical Examiners, Philadelphia.
Cole, N.S. & Moss, P.A. (1989). Bias in test use. In R.L. Linn (ed.), Educational Measurement (pp. 201-219). New York: American Council on Education and Macmillan.
Cook, T.D. & Campbell, D.T. (1979). Quasi-experimentation: Design and Analysis Issues for Field Settings. Chicago: Rand McNally.
Downing, S.M. (2002). Assessment of knowledge with written test forms. In G.R. Norman, C.P.M. Van der Vleuten & D.I. Newble (eds.), International Handbook for Research in Medical Education (pp. 647-672). Dordrecht, The Netherlands: Kluwer Academic Publications.
Haladyna, T.M. (1999). Developing and Validating Multiple-choice Test Items. Hillsdale, NJ: Lawrence Erlbaum Associates.
Haladyna, T.M., Downing, S.M. & Rodriguez, S.M. (2002). A review of multiple-choice item-writing guidelines. Applied Measurement in Education 15(3), 309-333.
Jozefowicz, R.F., Koeppen, B.M. et al. (2002). The quality of in-house medical school examinations. Acad. Med. 77: 156-161.
Messick, S. (1989). Validity. In R.L. Linn (ed.), Educational Measurement (pp. 13-104). New York: American Council on Education and Macmillan.
Norcini, J.J. & Shea, J.A. (1997). The credibility and comparability of standards. Applied Measurement in Education 10(1): 39-59.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Downing, S.M. Threats to the Validity of Locally Developed Multiple-Choice Tests in Medical Education: Construct-Irrelevant Variance and Construct Underrepresentation. Adv Health Sci Educ Theory Pract 7, 235–241 (2002). https://doi.org/10.1023/A:1021112514626
Issue Date:
DOI: https://doi.org/10.1023/A:1021112514626