In a script concordance test (SCT), examinees are asked to judge the effect of a new piece of clinical information on a proposed hypothesis. Answers are collected using a Likert-type scale (ranging from −2 to +2, with ‘0’ indicating no effect), and compared with those of a reference panel of ‘experts’. It has been argued, however, that SCT may be susceptible to the influences of gaming and guesswork. This study aims to address some of the mounting concern over the response process validity of SCT scores.
Using published datasets from three independent SCTs, we investigated examinee response patterns, and computed the score a hypothetical examinee would obtain on each of the tests if he 1) guessed random answers and 2) deliberately answered ‘0’ on all test items.
A simulated random guessing strategy led to scores 2 SDs below mean scores of actual respondents (Z-scores −3.6 to −2.1). A simulated ‘all-0’ strategy led to scores at least 1 SD above those obtained by random guessing (Z-scores −2.2 to −0.7). In one dataset, stepwise exclusion of items with modal panel response ‘0’ to fewer than 10% of the total number of test items yielded hypothetical scores 2 SDs below mean scores of actual respondents.
Random guessing was not an advantageous response strategy. An ‘all-0’ response strategy, however, demonstrated evidence of artificial score inflation. Our findings pose a significant threat to the SCT’s validity argument. ‘Testwiseness’ is a potential hazard to all testing formats, and appropriate countermeasures must be established. We propose an approach that might be used to mitigate a potentially real and troubling phenomenon in script concordance testing. The impact of this approach on the content validity of SCTs merits further discussion.