Skip to main content
Log in

Strength in Numbers or Quality over Quantity? Examining the Importance of Criterion Measure Selection to Define Validity Groups in Performance Validity Test (PVT) Research

  • Published:
Psychological Injury and Law Aims and scope Submit manuscript

Abstract

Mirroring clinical guidelines, recent Performance Validity Test (PVT) research emphasizes using ≥ 2 criterion PVTs to optimally identify validity groups when validating/cross-validating PVTs; however, even with multiple measures, the effect of which specific PVTs are used as criterion measures remains incompletely explored. This study investigated the accuracy of varying two-PVT combinations for establishing validity status and how adding a third PVT or applying more liberal failure cut-scores affects overall false-positive (FP)/-negative (FN) rates. Clinically referred veterans (N = 114; 30% clinically identified as invalid) completing a six-PVT protocol as during their evaluation were included. Concordance rates were calculated across all possible two-and three-PVT combinations at conservative and liberal cutoffs. Two-PVT combinations classified 72–91% of valid (0–4% FPs) and 17–74% of invalid (0–40% FNs) cases, and three-PVT combinations classified 67–86% of valid (0–6% FPs) and 57–97% of invalid (0–24% FNs) at conservative cutoffs. Liberal cutoffs classified 53–86% of valid (0–15% FPs) and 39–82% of invalid (0–30% FNs) cases for two-PVT combinations and 46–75% of valid (3–27% FPs) and 60–97% of invalid (0–17% FNs) cases for three-PVT combinations. Irrespective of whether a two-or three-PVT combination or conservative/liberal cutoffs were used, many valid and invalid cases failed only one PVT (3–68%).Two-PVT combinations produced high FNs and were less accurate than three-PVTs for detecting invalid cases, though variable accuracy was found within both types of combinations based on the specific PVTs in the combination. Thus, both PVT quantity and quality are important for accurate validity classification in research studies to ensure reliability and replicability of findings. Applying more liberal cutoffs yielded increased sensitivity, but with generally higher FPs yielding problematic specificity, particularly for three-PVT combinations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. One reviewer remarked that a single PVT failure using liberal cutoffs is not equivocal, but rather should be considered valid. While this position is tenable in a context in which multiple PVTs are administered, we maintain that when only two PVTs are given, one failure is, by definition, equivocal in that other extra-test data would ultimately need to be considered to establish validity status. Our objective data clearly demonstrated that even when liberal cutoffs were applied, for two-PVT combinations, mean failure rates of 1/2 PVTs were 35% for invalid cases and 28% for valid cases (Tables 4 and 5). Thus, nearly 30% of invalid cases would be misclassified if cases with one failure were automatically classified as valid. Additionally, in response to the significant increase in false positives for the three-PVT combinations using liberal cutoffs (i.e., > 10% on 7/19 combinations), the reviewer suggested that the solution to improve specificity (i.e., ≥ 90%; Boone, 2012) was to raise the invalidity threshold to 3/3 PVT failures. However, this approach is not consistent with current practice standards, in which ≥ 2 failures is the generally accepted benchmark for identifying probable invalidity (Larrabee, 2014; Meyers & Volbrecht, 2003), and would result in an unacceptable decrease in overall mean sensitivity from 79 to 35% for identifying invalid cases, whereas using conservative cutoffs and retaining the well-established ≥ 2 failures benchmark yielded 72% mean sensitivity while maintaining 97% mean specificity and 0/19 combinations with a false-positive rate above 6% (see Tables 6 and 7).

References

  • Alverson, W. A., O’Rourke, J. J. F., & Soble, J. R. (2019). The word memory test genuine memory impairment profile discriminates genuine memory impairment from invalid performance in a mixed clinical sample with cognitive impairment. The Clinical Neuropsychologist.

  • American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders - fifth edition (DSM-5). Washington, DC: American Psychiatric Publishing.

    Google Scholar 

  • An, K. Y., Kaploun, K., Erdodi, L. A., & Abeare, C. A. (2017). Performance validity in undergraduate research participants: a comparison of failure rates across tests and cutoffs. The Clinical Neuropsychologist, 31, 193–206.

    PubMed  Google Scholar 

  • Armistead-Jehle, P., Cooper, D. B., & Vanderploeg, R. D. (2016). The role of performance validity tests in the assessment of cognitive functioning after military concussion: a replication and extension. Applied Neuropsychology: Adult, 23(4), 264–273.

    Google Scholar 

  • Armistead-Jehle, P., Soble, J. R., Cooper, D. C., & Belanger, H. G. (2017). Unique aspects of TBI in military populations. [Special issue]. Physical Medicine & Rehabilitation Clinics of North America, 28, 323–337.

    Google Scholar 

  • Bailey, K. C., Soble, J. R., Bain, K. M., & Fullen, C. (2018a). Embedded performance validity tests in the Hopkins verbal learning test – revised and the brief visuospatial memory test – revised: a replication study. Archives of Clinical Neuropsychology, 33, 895–900.

    PubMed  Google Scholar 

  • Bailey, K. C., Soble, J. R., & O’Rourke, J. J. F. (2018b). Clinical utility of the Rey 15-Item Test, recognition trial, and error scores for detecting noncredible neuropsychological performance validity in a mixed-clinical sample of veterans. The Clinical Neuropsychologist, 32, 119–131.

    PubMed  Google Scholar 

  • Bain, K. M., & Soble, J. R. (2019). Validation of the advanced clinical solutions word choice test (WCT) in a mixed clinical sample: establishing classification accuracy, sensitivity/specificity, and cutoff scores. Assessment, 26, 1320–1328.

    PubMed  Google Scholar 

  • Bain, K. M., Soble, J. R., Webber, T. A., Messerly, J. M., Bailey, K. C., Kirton, J. W., & McCoy, K. J. M. (2019). Cross-validation of three advanced clinical solutions performance validity tests: examining combinations of measures to maximize classification of invalid performance. Applied Neuropsychology: Adult.

  • Boone, K. B. (2009). The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examinations. The Clinical Neuropsychologist, 23, 729–741.

    PubMed  Google Scholar 

  • Boone, K. B. (2012). Clinical practice of forensic neuropsychology. New York: Guilford Press.

    Google Scholar 

  • Boone, K., Lu, P., & Herzberg, D. S. (2002). The dot counting test manual. Los Angeles: Western Psychological Services.

    Google Scholar 

  • Critchfield, E. A., Soble, J. R., Marceaux, J. C., Bain, K. M., Bailey, K. C., Webber, T. A., et al. (2019). Cognitive impairment does not cause performance validity failure: analyzing performance patterns among unimpaired, impaired, and noncredible participants across six tests. The Clinical Neuropsychologist, 6, 1083–1101.

    Google Scholar 

  • Denning, J. H. (2012). The efficiency and accuracy of the Test of Memory Malingering trial 1, errors on the first 10 items of the Test of Memory malingering, and five embedded measures in predicting invalid test performance. Archives of Clinical Neuropsychology, 27(4), 417–432.

    PubMed  Google Scholar 

  • Erdodi, L. A. (2019). Aggregating validity indicators: the salience of domain specificity and the indeterminate range in multivariate models of performance validity assessment. Applied Neuropsychology: Adult, 26(2), 155–172.

    Google Scholar 

  • Erdodi, L. A., Kirsch, N. L., Lajiness-O’Neill, R., Vingilis, E., & Medoff, B. (2014). Comparing the Recognition Memory Test and the Word Choice Test in a mixed clinical sample: are they equivalent? Psychological Injury and Law, 7, 255–263.

    Google Scholar 

  • Fazio, R. L., Faris, A. N., & Yamout, K. Z. (2019). Use of the Rey 15-Item Test as a performance validity test in an elderly population. Applied Neuropsychology: Adult, 26, 28–35.

    Google Scholar 

  • Gasquoine, P. G., Weimer, A. A., & Amador, A. (2017). Specificity rates for non-clinical, bilingual, Mexican Americans on three popular performance validity measures. The Clinical Neuropsychologist, 31(3), 587–597. https://doi.org/10.1080/13854046.2016.1277786.

    Article  PubMed  Google Scholar 

  • Green, P. (2003). Green’s word memory test for windows: user’s manual. Edmonton: Green’s Publishing.

    Google Scholar 

  • Green, P., Montijo, J., & Brockhaus, R. (2011). High specificity of the Word Memory Test and Medical Symptom Validity Test in groups with severe verbal memory impairment. Applied Neuropsychology, 18(2), 86–94.

    PubMed  Google Scholar 

  • Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of malingered amnesia measures with a large clinical sample. Psychological Assessment, 6(3), 218–224.

    Google Scholar 

  • Grills, C. E., & Armistead-Jehle, P. (2016). Performance validity test and neuropsychological assessment battery screening module performances in an active-duty sample with a history of concussion. Applied Neuropsychology: Adult, 23(4), 295–301.

    Google Scholar 

  • Larrabee, G. J. (2008). Aggregation across multiple indicators improves the detection of malingering: relationship to likelihood ratios. The Clinical Neuropsychologist, 22(4), 666–679.

    PubMed  Google Scholar 

  • Larrabee, G. J. (2014). False-positive rates associated with the use of multiple performance and symptom validity tests. Archives of Clinical Neuropsychology, 29(4), 364–373.

    PubMed  Google Scholar 

  • Larrabee, G. J., Rohling, M. L., & Meyers, J. E. (2019). Use of multiple performance and symptom validity measures: determining the optimal per test cutoff for determination of invalidity, analysis of skew, and inter-test correlations in valid and invalid performance groups. The Clinical Neuropsychologist, 1–19.

  • Lezak, M. D., Howieson, D. B., Bigler, E. D., & Tranel, D. (2012). Neuropsychological assessment (5th ed.). Oxford: Oxford University Press.

    Google Scholar 

  • Lippa, S. M. (2018). Performance validity testing in neuropsychology: a clinical guide, critical review, and update on a rapidly evolving literature. The Clinical Neuropsychologist, 32(3), 391–421.

    PubMed  Google Scholar 

  • Loring, D. W., Goldstein, F. C., Chen, C., Drane, D. L., Lah, J. J., Zhao, L., & Larrabee, G. J. (2016). False-positive error rates for Reliable Digit Span and Auditory Verbal Learning Test performance validity measures in amnestic mild cognitive impairment and early Alzheimer disease. Archives of Clinical Neuropsychology, 31(4), 313–331.

    PubMed  PubMed Central  Google Scholar 

  • Martin, P. K., Schroeder, R. W., Olsen, D. H., Maloy, H., Boettcher, A., Ernst, N., & Okut, H. (2019). A systematic review and meta-analysis of the Test of Memory Malingering in adults: two decades of deception detection. The Clinical Neuropsychologist. https://doi.org/10.1080/13854046.2019.1637027.

    PubMed  Google Scholar 

  • Martin, P. K., Schroeder, R. W., & Odland, A. P. (2015). Neuropsychologists’ validity testing beliefs and practices: a survey of North American professionals. The Clinical Neuropsychologist, 29(6), 741–776.

    PubMed  Google Scholar 

  • Meyers, J. E., Miller, R. M., Thompson, L. M., Scalese, A. M., Allred, B. C., Rupp, Z. W., Dupaix, Z. P., & Junghyun Lee, A. (2014). Using likelihood ratios to detect invalid performance with performance validity measures. Archives of Clinical Neuropsychology, 29(3), 224–235.

    PubMed  Google Scholar 

  • Meyers, J. E., & Volbrecht, M. (2003). A validation of multiple malingering detection methods in a large clinical sample. Archives of Clinical Neuropsychology, 18, 261–276.

    PubMed  Google Scholar 

  • Novitski, J., Steele, S., Karantzoulis, S., & Randolph, C. (2012). The repeatable battery for the assessment of neuropsychological status effort scale. Archives of Clinical Neuropsychology, 27, 190–195.

    PubMed  Google Scholar 

  • Pearson. (2009). Advanced clinical solutions for WAIS-IV and WMS-IV: clinical and interpretive manual. San Antonio: Pearson.

    Google Scholar 

  • Poreh, A., Bezdicek, O., Korobkova, I., Levin, J. B, & Dines, P. (2016). The Rey Auditory Verbal Learning Test forced-choice recognition task: base-rate data and norms. Applied Neuropsychology: Adult, 23, 155–161.

  • Poynter, K., Boone, K. B., Ermshar, A., Miora, D., Cottingham, M., Victor, T. L., Ziegler, E., Zeller, M. A., & Wright, M. (2019). Wait, there’s a baby in this bath water! Update on quantitative and qualitative cut-offs for Rey 15-Item Recall and Recognition. Archives of Clinical Neuropsychology, 34, 1367–1380.

    PubMed  Google Scholar 

  • Rai, J. K., & Erdodi, L. A. (2019). Impact of criterion measures on the classification accuracy of TOMM-1. Applied Neuropsychology: Adult.

  • Rey, A. (1964). L’examen Clinique en psychologie. Paris: Presses Universitaires de France.

    Google Scholar 

  • Schroeder, R. W., Martin, P. K., Heinrichs, R. J., & Baade, L. E. (2019). Research methods in performance validity testing studies: criterion grouping approach impacts study outcomes. The Clinical Neuropsychologist, 33, 466–477.

    PubMed  Google Scholar 

  • Schroeder, R. W., Twumasi-Ankrah, P., Baade, L. E., & Marshall, P. S. (2012). Reliable digit span: a systematic review and cross-validation study. Assessment, 19(1), 21–30.

    PubMed  Google Scholar 

  • Schwartz, E. S., Erdodi, L., Rodriguez, N., Ghosh, J. J., Curtain, J. R., Flashman, L. A., & Roth, R. M. (2016). CVLT-II forced choice recognition trial as an embedded validity indicator: a systematic review of the evidence. Journal of the International Neuropsychological Society, 22, 851–858.

    PubMed  Google Scholar 

  • Silverberg, N. D., Wertheimer, J. C., & Fichtenberg, N. L. (2007). An effort index for the Repeatable Battery For The Assessment Of Neuropsychological Status (RBANS). The Clinical Neuropsychologist, 21(5), 841–854.

    PubMed  Google Scholar 

  • Slick, D. J., Sherman, E. M., & Iverson, G. L. (1999). Diagnostic criteria for malingered neurocognitive dysfunction: proposed standards for clinical practice and research. The Clinical Neuropsychologist, 13(4), 545–561.

    PubMed  Google Scholar 

  • Slick, D. J., Tan, J. E., Strauss, E. H., & Hultsch, D. F. (2004). Detecting malingering: a survey of experts’ practices. Archives of Clinical Neuropsychology, 19(4), 465–473.

    PubMed  Google Scholar 

  • Tombaugh, T. N. (1996). Test of memory malingering (TOMM). North Tonawanda: Multi Health Systems.

    Google Scholar 

  • Webber, T. A., Bailey, K. C., Alverson, W. A., Critchfield, E. A., Bain, K. M., Messerly, J. M., et al. (2018a). Further validation of the Test of Memory Malingering (TOMM) Trial 1: examination of false positives and convergence with other validity measures. Psychological Injury and Law, 11, 325–335.

    Google Scholar 

  • Webber, T. A., Critchfield, E. A., & Soble, J. R. (2018b). Convergent, discriminant, and concurrent validity of non-memory-based performance validity tests. Assessment.

  • Webber, T. A., Marceaux, J. C., Critchfield, E. A., & Soble, J. R. (2018c). Relative impacts of mild and major neurocognitive disorder on rate of verbal learning acquisition. Archives of Clinical Neuropsychology.

  • Webber, T. A., & Soble, J. R. (2018). Utility of various WAIS-IV digit span indices for identifying noncredible performance validity among cognitively impaired and unimpaired examinees. The Clinical Neuropsychologist, 32(4), 657–670.

    PubMed  Google Scholar 

  • Wechsler, D. (2008). WAIS-IV: administration and scoring manual. San Antonio: Pearson.

    Google Scholar 

  • Weinstein, S., Obuchowski, N. A., & Lieber, M. L. (2005). Clinical evaluation of diagnostic tests. American Journal of Roentgenology, 184(1), 141–149.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason R. Soble.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Disclaimer

The views expressed herein are those of the authors and do not necessarily reflect the views or the official policy of the Department of Veterans Affairs or US Government.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Soble, J.R., Alverson, W.A., Phillips, J.I. et al. Strength in Numbers or Quality over Quantity? Examining the Importance of Criterion Measure Selection to Define Validity Groups in Performance Validity Test (PVT) Research. Psychol. Inj. and Law 13, 44–56 (2020). https://doi.org/10.1007/s12207-019-09370-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12207-019-09370-w

Keywords

Navigation