Abstract
Context
In-training assessment (ITA), defined as multiple assessments of performance in the setting of day-to-day practice, is an invaluable tool in assessment programmes which aim to assess professional competence in a comprehensive and valid way. Research on clinical performance ratings, however, consistently shows weaknesses concerning accuracy, reliability and validity. Attempts to improve the psychometric characteristics of ITA focusing on standardisation and objectivity of measurement thus far result in limited improvement of ITA-practices.
Purpose
The aim of the paper is to demonstrate that the psychometric framework may limit more meaningful educational approaches to performance assessment, because it does not take into account key issues in the mechanics of the assessment process. Based on insights from other disciplines, we propose an approach to ITA that takes a constructivist, social-psychological perspective and integrates elements of theories of cognition, motivation and decision making. A central assumption in the proposed framework is that performance assessment is a judgment and decision making process, in which rating outcomes are influenced by interactions between individuals and the social context in which assessment occurs.
Discussion
The issues raised in the article and the proposed assessment framework bring forward a number of implications for current performance assessment practice. It is argued that focusing on the context of performance assessment may be more effective in improving ITA practices than focusing strictly on raters and rating instruments. Furthermore, the constructivist approach towards assessment has important implications for assessment procedures as well as the evaluation of assessment quality. Finally, it is argued that further research into performance assessment should contribute towards a better understanding of the factors that influence rating outcomes, such as rater motivation, assessment procedures and other contextual variables.
Similar content being viewed by others
References
Barneveld van C. (2005). The dependability of medical students’ performance ratings as documented on in-training evaluations. Academic Medicine 80(3): 309–312
Bernardin, H.J., Orban, J.A. & Carlyle J.J. (1981). Performance ratings as a function of trust in appraisal and rater individual differences. Academy of Management Proceedings: 311–315
Borman W.C., Motowidlo S.J. (1997). Task performance and contextual performance: the meaning for personnel selection research. Human Performance 10: 99–109
Cardy R.L., Bernardin H.J., Abbott J.G., Senderak M.P., Taylor K. (1987) The effects of individual performance schemata and dimension familiarization on rating accuracy. Journal of Occupational Psychology 60: 197–205
Chi M.T.H., Glaser R., Farr M.J. (1989). The Nature of Expertise. Hillsdale, New Jersey
Clauser B.E., Schuwirth L.W.T. (2002). The use of computers in assessment. In: G.R. Norman, C.P.M. van der Vleuten, D.I. Newble (eds), International Handbook of Research in Medical Education, Kluwer Academic Publishers, Dordrecht, pp.757–792
Cleveland J.N., Murphy K.R., Williams R.E. (1989). Multiple uses of performance appraisal: prevalence and correlates. Journal of Applied Psychology 74: 130–135
Coderre S., Mandin H., Harasym P.H., Fick G.H. (2003). Diagnostic reasoning strategies and diagnostic success. Medical Education 37: 695–703
Crooks T. (1998). The impact of classroom evaluation practices on students. Review of Educational Research 58(4): 438–481
Delandshere G., Petrosky A.R. (1998). Assessment of complex performances: limitations of key measurement assumptions. Educational Researcher 27(2): 14–24
DeNisi A.S., Peters L.H. (1996). Organization of information in memory and the performance appraisal process: evidence from the field. Journal of Applied Psychology 81(6): 717–737
DeNisi A.S., Robbins T., Cafferty T.P. (1989). Organization of information used for performance appraisals: role of diary-keeping. Journal of Applied Psychology 74(1): 124–129
DeNisi A.S., Williams K.J. (1988). Cognitive approaches to performance appraisal. In: G. Ferris, K. Rowland (eds) Research in Personnel and Human Resource Management (Vol. 6). JAI Press, Greenwich, CT
Driessen E., Vleuten van der C., Schuwirth L., Tartwijk van J., Vermunt J. (2005). The use of qualitative research criteria for portfolio assessment as an alternative to reliability evaluation: a case study. Medical Education, 39: 214–220
Erdogan B., Kraimer M.L., Liden R.C. (2001). Procedural justice as a two-dimensional construct. An examination in the performance appraisal context. Journal of Applied Behavioural Science 37(2): 205–222
Eva K.W. (2004). What every teacher needs to know about clinical reasoning. Medical Education 39: 98–106
Fiske S.T., Taylor S.E. (1991). Social Cognition (2nd ed). McGraw-Hill, New York
Forgas J.P., George J.M. (2001). Affective influences on judgments and behavior in organizations: an information processing perspective. Organizational Behavior and Human Decision Processes 86(1): 3–34
Forgas J.P. (2002). Feeling and doing: influences on interpersonal behavior. Psychological Inquiry 13(1): 1–28
Govaerts M.J.B., Vleuten van der C.P.M., Schuwirth L.W.T., Muijtjens A.M.M. (2005). The use of observational diaries in in-training evaluation: student perceptions. Advances in Health Sciences Education 10: 171–188
Gray J.D. (1996). Global rating scales in residency education. Academic Medicine 71(1): S55–S63
Greguras G.J., Robie C., Schleicher D.J., Goff M. III (2003). A field study of the effects of rating purpose on the quality of multisource ratings. Personnel Psychology 56: 1–20
Guba E., Lincoln Y. (1989). Fourth Generation Evaluation. Sage Publications, London
Harris M. (1994). Rater motivation in the performance appraisal context: a theoretical framework. Journal of Management 20(4): 737–756
Hauenstein N.M.A. (1992). An information-processing approach to leniency in performance judgments. Journal of Applied Psychology 77(4): 485–493
Hawe E. (2003). It’s pretty difficult to fail: the reluctance of lecturers to award a failing grade. Assessment and Evaluation in Higher Education 28(4): 371–382
Hodgkinson G.P. (2003). The interface of cognitive and industrial, work and organizational psychology. Journal of Occupational and Organizational Psychology 76: 1–25
Hoffman K.G., Donaldson J.F. (2004). Contextual tensions of the clinical environment and their influence on teaching and learning. Medical Education 38: 448–454
Hogg M.A. (2003). Introducing social psychology. In: Hogg M.A. (ed) Social Psychology, Vol. I: Social Cognition and Social Perception. Sage Publications, London, pp xxi–lix
Holmboe E.S. (2004). Faculty and the observation of trainees’ clinical skills: problems and opportunities. Academic Medicine 79(1): 16–22
Hull A.L., Hodder S., Berger B., Ginsberg D., Lindheim N., Quan J., Kleinhenz M. (1995). Validity of three clinical performance assessments of internal medicine clerks. Academic Medicine 70(6): 517–522
Jelley R.B., Goffin R.D. (2001). Can performance-feedback accuracy be improved? Effects of rater priming and rating-scale format on rating accuracy. Journal of Applied Psychology 86(1): 134–144
Johnson J.W. (2001). The relative importance of task and contextual performance dimensions to supervisor judgements of overall performance. Journal of Applied Psychology 86(5): 984–996
Johnston B. (2004). Summative assessment of portfolios: an examination of different approaches to agreement over outcomes. Studies in Higher Education 29(3): 395–412
Judge T.A., Ferris G.R. (1993). Social context of performance evaluation decisions. Academy of Management Journal 36(1): 80–105
Kahn M.J., Merrill W.W., Anderson D.S., Szerlip H.M. (2001). Residency program director evaluations do not correlate with performance on a required 4th-year objective structured clinical examination. Teaching and Learning in Medicine 13(1): 9–12
Klimoski R., Inks L. (1990). Accountability forces in performance appraisal. Organizational Behavior and Human Decision Processes, 45: 194–208
Krefting L. (1991). Rigor in qualitative research: the assessment of trustworthiness. American Journal of Occupational Therapy 45: 214–222
Komatsu L.K. (1992). Recent views on conceptual structure. Psychological Bulletin 112(3): 500–526
Kozlowski S.W.J., Mongillo M. (1992). The nature of conceptual similarity schemata: examination of some basic assumptions. Personality and Social Psychology Bulletin 18: 88–95
Kwolek C.J., Donnelly M.B., Sloan D.A., Birrell S.N., Strodel W.E., Schwartz R.W. (1997). Ward evaluations: should they be abandoned? Journal of Surgical Research, 69(1): 1–6
Lance C.E., LaPointe J.A., Stewart A.M. (1994). A test of the context dependency of three causal models of halo rater error. Journal of Applied Psychology 79(3): 332–340
Lance C.E., Teachout M.S., Donnelly T.M. (1992). Specification of the criterion construct space: an application of hierarchical confirmatory factor analysis. Journal of Applied Psychology 77(4): 437–452
Landy F.J., Farr J.L. (1980). Performance rating. Psychological Bulletin 87(1): 72–107
Lievens F. (2001). Assessor training strategies and their effects on accuracy, interrater reliability and discriminant validity. Journal of Applied Psychology 86(2): 225–264
Littlefield J.H., DaRosa D.A., Anderson K.D., Bell R.M., Nicholas G.G., Wolfson P.J. (1991). Assessing performance in clerkships: accuracy of surgery clerkship performance raters. Academic Medicine 66(9), S16–S18
Longenecker C.O., Gioia D.A. (2000). Confronting the “politics” in performance appraisal. Business Forum, 25(3,4): 17–23
van Luijk, S.J., van der Vleuten, C.P.M. & Schelven, R.M. (1990). The relation between content and psychometric characteristics in performance-based testing. In W. Bender, R.J. Hiemstra, A.J.J.A. Scherpbier & R.P. Zwierstra (eds.), Teaching and Assessing Clinical Competence, pp. 497–502. Groningen: Boekwerk Publications
McDowell L. (1995). The impact of innovative assessment on student learning. Innovations in Education and Training International, 32(4): 302–313
McGaghie, W.C. (1993). Evaluating competence for professional practice. In: L. Curry, J.F. Wergin & Associates (eds.), Educating Professionals: Responding to New Expectations for Competence And Accountability, pp. 229–261. San Francisco: Jossey-Bass Inc., Publishers
McIlroy J.H., Hodges B., McNaughton N., Regehr G. (2002). The effect of candidates’ perceptions of the evaluation method on reliability of checklist and global rating scores in an objective structured clinical examination. Academic Medicine 77: 725–728
Mero N.P., Motowidlo S.J. (1995). Effects of rater accountability on the accuracy and the favorability of performance ratings. Journal of Applied Psychology 80(4): 517–524
Mero N.P., Motowidlo S.J., Anna A.L. (2003). Effects of accountability on rating behavior and rater accuracy. Journal of Applied Social Psychology 33(12): 2493–2514
Messick S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher 23(2): 13–23
Middendorf C.H., Macan T.H. (2002). Note-taking in the employment interview: effects on recall and judgments. Journal of Applied Psychology 87(2): 293–303
Murphy K.R., Balzer W.K. (1986). Systematic distortions in memory-based behavior ratings and performance evaluation: consequences for rating accuracy. Journal of Applied Psychology 71: 39–44
Murphy K.R., Balzer W.K. (1989). Rating errors and rating accuracy. Journal of Applied Psychology, 74(4): 619–624
Murphy K.R., Cleveland J.N. (1995). Understanding Performance Appraisal. Social, Organizational and Goal-based Perspectives. Sage Publications, Thousand Oaks, CA
Murphy K.R., Cleveland J.N., Skattebo A.L., Kinney T.B. (2004). Raters who pursue different goals give different ratings. Journal of Applied Psychology 89(1): 158–164
Murphy K.R., Balzer W.K., Kellam K.L., Armstrong J. (1984). Effects of purpose of rating on accuracy in observing teacher behavior and evaluating teaching behavior. Journal of Educational Psychology 76: 45–54
Nahum G.G. (2004). Evaluating medical student obstetrics and gynecology clerkship performance: which assessment tools are most reliable? American Journal of Obstetrics and Gynaecology 191: 1762–1771
Nichols P.D., Smith P.L. (1998). Contextualizing the interpretation of reliability data. Educational Measurement: Issues and Practice 17: 24–36
Noel G.L., Herbers J.E.J., Caplow M.P., Cooper G.S., Pangaro L.N., Harvey J. (1992). How well do internal medicine faculty members evaluate the clinical skills of residents? Annals of Internal Medicine 117: 757–765
Norman G. (2005). Research in clinical reasoning: past history and current trends. Medical Education 39(4): 418–427
Pangaro L.N. (2000). Investing in descriptive evaluation: a vision for the future of assessment. Medical Teacher 22(5): 478–481
Petrusa E.R. (2002). Clinical performance assessments. In: G.R. Norman, C.P.M. van der Vleuten, D.I. Newble (eds), International Handbook of Research in Medical Education, Kluwer Academic Publishers, Dordrecht, pp.673–709
Piggot-Irvine E. (2003). Key features of appraisal effectiveness. The International Journal of Educational Management 17(4): 170–178
Prescott L.E., Norcini J.J., McKinlay P., Rennie J.S. (2002). Facing the challenges of competency-based assessment of postgraduate dental training: longitudinal evaluation of perfromance (LEP). Medical Education 36: 92–97
Ramsey P.G., Wenrich M.D., Carline J.D., Inui T.S., Larson E.B., Logerfo J.P. (1993). Use of peer ratings to evaluate physician performance. Journal of the American Medical Association 269(13): 1655–1660
Reznick R.K., Rajaratanam K. (2000). Performance-based assessment. In: L.H. Distlehorst, G.L. Dunnington, J.R. Folse (eds) Teaching and Learning in Medical and Surgical Education. Lessons Learned for the 21st Century, Lawrence Erlbaum Ass, Mahwah NJ, pp. 237–243
Rothman A.J., Schwarz N. (1998). Constructing perceptions of vulnerability: personal relevance and the use of experiential information in health judgments. Personality and Social Psychology Bulletin 24(10): 1053–1064
Rust C., O’Donovan B., Price M. (2005). A social constructivist assessment process model: how the research literature shows us this could be best practice. Assessment & Evaluation in Higher Education 30(3): 231–240
Sanchez J.I., DeLaTorre P. (1996). A second look at the relationship between rating and behavioral accuracy in performance appraisal. Journal of Applied Psychology 81(1): 3–10
Schleicher D.J., Day D.V. (1998) A cognitive evaluation of frame-of-reference rater training: content and process issues. Organizational Behaviour and Human Decision Processses 73(1): 76–101
Schmidt H.G., Norman G.R., Boshuizen H.P.A. (1990). A cognitive perspective on medical expertise: theory and implications. Academic Medicine 65(10): 611–621
Schwind C.J., Williams R.G., Boehler M.L., Dunnington G.L. (2004). Do individual attending post-rotation performance ratings detect resident clinical performance deficiencies? Academic Medicine 79: 453–457
Siemer M., Reisenzein R. (1998). Effects of mood on evaluative judgements: influence of reduced processing capacity and mood salience. Cognition and Emotion 12(6): 783–805
Silber C.G., Nasca T.J., Paskin D.L., Eiger G., Robeson M., Veloski J.J. (2004). Do global rating forms enable program directors to assess the ACGME competencies? Academic Medicine 79: 549–556
Sloan D.A., Donnelly M.B., Drake D.B., Schwartz R.W. (1995). Faculty sensitivity in detecting medical students’ clinical competence. Medical Teacher 17(3): 335–342
Speer A.J., Soloman D.J., Fincher R.M. (2000). Grade inflation in internal medicine clerkships: results of a national survey. Teaching and Learning in Medicine 12: 112–116
Sulsky L.M., Keown J.L. (1999). Performance appraisal in the changing world of work: implications for the meaning and measurement of work performance. Canadian Psychology 39(1–2): 52–59
Taylor M.S., Tracy K.B., Renard M.K., Harrison J.K., Carroll S.J. (1995). Due process in performance appraisal: a quasi-experiment in procedural justice. Administrative Science Quarterly 40: 495–523
Tetlock P.E. (1983). Accountability and complexity of thought. Journal of Personality and Social Psychology 45: 74–83
Tetlock P.E. (1985). Accountability: the neglected social context of judgment and choice. In: L.L. Cummings, B.M. Staw (eds) Research in Organizational Behavior Vol. 7, JAI Press, Greenwich, CT, pp 297–332
Tigelaar D.E.H., Dolmans D.H.J.M., Wolfhagen I.H.A.P., van der Vleuten C.P.M. (2005). Quality issues in judging portfolios: implications for organizing teaching portfolio assessment procedures. Studies in Higher Education 30(5): 595–610
Turnbull J., Barneveld van C. (2002). Assessment of clinical performance: in-training evaluation. In: G.R. Norman, C.P.M. van der Vleuten, D.I. Newble (eds), International Handbook of Research in Medical Education, Kluwer Academic Publishers, Dordrecht, pp. 793–810
Verhulst S., Colliver J., Paiva R., Williams R.G. (1986). A factor analysis of performance of first-year residents. Journal of Medical Education 61: 132–134
Vleuten van der C.P.M. (1996). The assessment of professional competence: developments, research and practical implications. Advances in Health Sciences Education 1: 41–67
Vleuten van der C.P.M., Schuwirth L.W.T. (2005). Assessing professional competence: from methods to programmes. Medical Education 39: 309–317
Vleuten van der C.P.M., Scherpbier A.J.J.A., Dolmans D.H.J.M., Schuwirth L.W.T., Verwijnen G.M., Wolfhagen H.A.P. (2000). Clerkship assessment assessed. Medical Teacher 22(6): 592–600
Walsh J.P. (1995). Managerial and organizational cognition: notes from a trip down memory lane. Organization Science 6(3): 280–321
Williams K.J., DeNisi A.S., Blencoe A.G., Cafferty T.P. (1985). The role of appraisal purpose: effects of purpose on information acquisition and utilization. Organizational Behavior and Human Performance 35: 314–339
Williams R.G., Klamen D.A., McGaghie W.C. (2003). Cognitive, social and envrionmental sources of bias in clinical performance settings. Teaching and Learning in Medicine 15(4): 270–292
Woehr D.J., Huffcutt A.I. (1994). Rater training for performance appraisal: a quantitative review. Journal of Occupational and Organisational Psychology 67: 189–205
Zedeck S. (1986). A process analysis of the assessment center method. Research in Organizational Behavior 8: 259–296
Zieky M.J. (2001). So much has changed: how the setting of cutscores has evolved since the 1980s. In G.J. Cizek (ed) Setting Performance Standard: Concepts, Methods and Perspectives, Lawrence Erlbaum Associates, Mahwah NJ, pp. 19–53
Acknowledgements
The authors would like to thank Mereke Gorsira for critically reading and correcting the English manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Govaerts, M.J.B., van der Vleuten, C.P.M., Schuwirth, L.W.T. et al. Broadening Perspectives on Clinical Performance Assessment: Rethinking the Nature of In-training Assessment. Adv Health Sci Educ Theory Pract 12, 239–260 (2007). https://doi.org/10.1007/s10459-006-9043-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10459-006-9043-1