Elsevier

Cortex

Volume 54, May 2014, Pages 51-62
Cortex

Research report
Validation of an integrated method for determining cognitive ability: Implications for routine assessments and clinical trials

https://doi.org/10.1016/j.cortex.2014.01.016Get rights and content

Abstract

Introduction

Although accurate diagnosis of deficit of mild intensity is critical, various methods are used to assess, dichotomize and integrate performance, with no validated gold standard. This study described and validated a framework for the analysis of cognitive performance.

Methods

This study was performed by using the Groupe de Réflexion sur L'Evaluation des Fonctions EXécutives (GREFEX) database (724 controls and 461 patients) examined by 7 tests assessing executive functions. The first phase determined the criteria for the cutoff scores, the second phase, the effect of test number on diagnostic accuracy and the third phase, the best methods for combining test scores into an overall summary score. Four validation criteria were used: determination of impaired performance as compared to expected one, false-positive rate ≤5%, detection of both single and multiple impairments with optimal sensitivity.

Results

The procedure based on 5th percentile cutoffs determined from standardized residuals was the most appropriate procedure. Although area under the curve (AUC) increased with the number of scores (p = .0001), the false-positive rate also increased (p = .0001), resulting in suboptimal sensitivity for detecting selective impairment. Two overall summary scores, the average of the seven process scores and the Item Response Theory (IRT) score, had significantly (p = .0001) higher AUCs, even for patients with a selective impairment, and provided higher resulting prevalence of dysexecutive disorders (p = .0001).

Conclusions

The present study provides and validates a generative framework for the interpretation of cognitive data. Two overall summary score met all 4 validation criteria. A practical consequence is the need to profoundly modify the analysis and interpretation of cognitive assessments for both routine use and clinical research.

Introduction

Given the importance of cognition in contemporary societies and its impact on health, accurate diagnosis of cognitive ability is critical. Cognitive ability is typically assessed in subjects from heterogeneous backgrounds, using a battery of cognitive tests covering language, visuospatial, memory, executive and general cognitive domains. Each test yields between 1 and 15 performance scores, which are dichotomized (normal vs impaired) according to norms ideally corrected as appropriate for age and education. Next, the dichotomized scores are integrated to form a clinical diagnosis. Despite major progress in this field, a survey of clinical practice in academic memory clinics and rehabilitation centers (Godefroy et al., 2004) and a review of published studies assessing preclinical (Bateman et al., 2012, Hultsch et al., 2000, Knopman et al., 2012, Tractenberg and Pietrzak, 2011) and mild cognitive impairment (Clark et al., 2013, Winblad et al., 2004), dementia (Dubois et al., 2010), stroke (Godefroy et al., 2011, Tatemichi et al., 1994), cardiac surgery (Moller et al., 1998, Murkin et al., 1995), multiple sclerosis (Rao, Leo, Bernardin, & Unverzagt, 1991; Sepulcre et al., 2006) and Parkinson's disease (Cooper, Sagar, Tidswell, & Jordan, 1994; Dalrymple-Alford et al., 2011, Litvan et al., 2012) showed that various methods are used to assess, dichotomize and integrate performance, with no reference to a validated gold standard (Brooks and Iverson, 2010, Crawford et al., 2007, Dalrymple-Alford et al., 2011, Lezak et al., 2004; Mungas, Marshall, Weldon, Haan, & Reed, 1996; Sepulcre et al., 2006). Several carefully designed studies have shown that the use of different criteria for impairment dramatically influences the estimated prevalence of cognitive impairment (Clark et al., 2013, Dalrymple-Alford et al., 2011, Sepulcre et al., 2006). Most importantly, a review of these studies failed to provide a rationale for determining the best criterion in the assessment of cognitive impairment. The absence of a standardized method undermines the reliable determination of cognitive status, which in turn has a major impact on both clinical practice and clinical research. This point is especially important because the objective of cognitive assessment has shifted towards the diagnosis of deficit of mild intensity or of selective deficit.

A systematic review of previous studies, of diagnostic criteria of cognitive impairment (e.g., Clark et al., 2013, Knopman et al., 2012, Winblad et al., 2004) and of available normative data of clinical battery shows that methodology differ in three critical respects: the dichotomization of performance, the integration of several dichotomized scores, and the possible use of a global summary score. The first issue concerns the cutoff criteria used to dichotomize performance. Most cutoffs are based on means and standard deviations (SDs) and use varying cutpoints from 1.5 to 1.98 SD. However, the effect of the deviation from normality of most cognitive scores is rarely addressed. Cutoff scores are sometimes based on percentiles, the 10th and 5th percentiles being the most frequently used. Second, cognitive assessment involves multiple tests, thus providing numerous scores. Procedures differ regarding the combination of tests and scores used as criterion of cognitive impairment. Some procedures consider that just one impaired test score is sufficient for classifying a subject as “impaired”, whereas others require “impaired” subjects to have at least two (or more) impaired test scores. Other procedures take into account the cognitive domain (each domain being assessed with one to several scores) and classify as impaired subjects with at least one or two or more impaired domain. In clinical practice, the interpretation is usually based on counting the number of impaired scores. Importantly, the use of multiple tests improves sensitivity but it can also artificially increase the false-positive rate (i.e., lowering the specificity) (Brooks and Iverson, 2010), a concern especially important as the scores are often inter-correlated (Crawford et al., 2007). This well-known redundancy artifact is addressed in trials at the stage of interpretation of statistical analyses using correction for multiple analyses, such as Bonferroni correction. However, this artifact has rarely been examined in the field of test battery interpretation, which typically involves 20–50 performance scores (Brooks and Iverson, 2010; Godefroy et al., 2010). Thus, there is currently no rationale for determining the optimal number of tests/scores for diagnostic accuracy (i.e., both sensitivity and specificity). Third, some trials have combined individual test scores into a global summary score. Various types of summary scores have been used: the number of impaired scores, the number of impaired domains, the mean scores for the various cognitive domains (e.g., language, visuospatial, memory, executive functions) and global scores (e.g., average of all cognitive scores after conversion of raw scores into a common metric, such as a z score). However, the influence of the use of a global summary score on diagnostic accuracy (and particularly its ability to detect a selective impairment) has not previously been analyzed. This review emphasizes the importance of examining the effects of various procedures and criteria on diagnostic accuracy and providing a rationale for optimization of these procedures and criteria.

The objective of this study, based on Standards for Reporting of Diagnostic Accuracy (STARD) guidelines (Bossuyt et al., 2003), was to describe the structure and to validate a framework for the analysis and integration of cognitive performance which provides optimal diagnosis accuracy (i.e., both sensitivity and specificity).

Section snippets

Population

This study was performed using the Groupe de Réflexion sur L'Evaluation des Fonctions EXécutives (GREFEX) database, which assessed executive functions in French-speaking participants (Godefroy et al., 2010). Briefly, the study included 724 controls (mean ± SD age: 49.5 ± 19.8; males: 44%; educational level: primary: 22%; secondary: 34%; higher: 44%) and a group of 461 patients (age: 50.4 ± 19.4; males: 54%; educational level: primary: 28%; secondary: 40%; higher: 32%) presenting various

Phase 1: determination of cutoff scores

All raw cognitive scores in controls deviated from normality (Supplementary Table 1). Most transformed scores were influenced by age, education or both. Using standardized residuals, the 5th percentile was found to provide the most appropriate cutoff point, as it is not influenced by the variation from a normal distribution. This is illustrated by cutoff scores computed from z scores at the 5th percentile level (Supplementary Table 1): most of the scores were above −1.64, which is indeed

Discussion

This study reports on comprehensive, comparative analyses of methods of key importance in the field of cognition. Our framework provides a generative, integrated methodology for the analysis and integration of cognitive performance and yields important, practical conclusions on the assessment of cognitive ability in both clinical practice and research. Although several studies have previously addressed each of the three above-described phases separately, there has been no assessment or

References (56)

  • P.M. Bossuyt et al.

    Standards for Reporting of Diagnostic Accuracy. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative

    BMJ

    (2003)
  • G.E.P. Box et al.

    An analysis of transformations

    Journal of the Royal Statistical Society. Series B

    (1964)
  • D.C. Briggs et al.

    An introduction to multidimensional measurement using Rasch models

    Journal of Applied Measurement

    (2003)
  • B.L. Brooks et al.

    Comparing actual to estimated base rates of “abnormal” scores on neuropsychological test batteries: implications for interpretation

    Archives of Clinical Neuropsychology

    (2010)
  • D. Cardebat et al.

    Évocation lexicale formelle et sémantique chez des sujets normaux: Performances et dynamiques de production en fonction du sexe, de l’âge et du niveau d’étude

    Acta Neurologica Belgica

    (1990)
  • L.R. Clark et al.

    Are empirically-derived subtypes of mild cognitive impairment consistent with conventional subtypes?

    Journal of the International Neuropsychological Society

    (2013)
  • J.A. Cooper et al.

    Slowed central processing in simple and go/no-go reaction time tasks in Parkinson's disease

    Brain

    (1994)
  • P.K. Crane et al.

    Composite scores for executive function items: demographic heterogeneity and relationships with quantitative magnetic resonance imaging

    Journal of the International Neuropsychological Society

    (2008)
  • J.R. Crawford et al.

    Estimating the percentage of the population with abnormally low scores (or abnormally large score differences) on standardized neuropsychological test batteries: a generic method with applications

    Neuropsychology

    (2007)
  • J.C. Dalrymple-Alford et al.

    Characterizing mild cognitive impairment in Parkinson's disease

    Mov Movement Disorders

    (2011)
  • E.R. DeLong et al.

    Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach

    Biometrics

    (1988)
  • L.E. Gibbons et al.

    Alzheimer's disease neuroimaging initiative. A composite score for executive functioning, validated in Alzheimer's Disease Neuroimaging Initiative (ADNI) participants with baseline mild cognitive impairment

    Brain Imaging and Behavior

    (2012)
  • O. Godefroy

    Frontal syndrome and disorders of executive functions

    The Journal of Neurology

    (2003)
  • O. Godefroy et al.

    Dysexecutive syndrome: diagnostic criteria and validation study

    Annals of Neurology

    (2010)
  • O. Godefroy et al.

    Vascular aphasias: main characteristics of patients hospitalized in acute stroke units

    Stroke

    (2002)
  • O. Godefroy et al.

    Is the montreal cognitive assessment superior to the mini-mental state examination to detect post-stroke cognitive impairment? A study with neuropsychological evaluation

    Stroke

    (2011)
  • O. Godefroy et al.

    Non-spatial attention disorders in patients with frontal or posterior brain damage

    Brain

    (1996)
  • O. Godefroy et al.

    Neuropsychological changes related to unilateral lenticulostriate infarcts

    Journal of Neurology, Neurosurgery, and Psychiatry

    (1994)
  • Cited by (42)

    • Dysexecutive disorders and their diagnosis: A position paper

      2018, Cortex
      Citation Excerpt :

      The increment in the number of tests (and consequently of scores) was found to increase sensitivity until a plateau is reached (Godefroy et al., 2014a). This improved sensitivity is obtained at the cost of specificity (with a 2.6% decrease of specificity per additional performance score in one study) (Godefroy et al., 2014a). Depending on the objective, the selection of a subset of tests from a battery can simplify the assessment and increase accuracy (i.e., both sensitivity and specificity) in clinical practice.

    View all citing articles on Scopus
    1

    Groupe de Réflexion sur l'Evaluation des Fonctions Exécutives (GREFEX) study group: the following centers and investigators participated in the GREFEX cooperative study (n = number of patients included at each center; investigators): Amiens University Hospital (F) (n = 183; O. Godefroy and M. Roussel), Angers University Hospital (F) (n = 19; D. Le Gall), Heliomarin Rehabilitation Center Berck (F) (n = 15; C. Bertola), Bordeaux University Hospital (F) (n = 28; J.M. Giroire and P.A. Joseph), Saint Luc University Hospital Brussels (B) (n = 6; X. Seron, F. Coyette), Cholet General Hospital (F) (n = 8; E. Bretault and I. Bernard), Ottignies William Lennox Center (B) (n = 3; M. Leclercq), Garches University Hospital (F) (n = 9; P. Azouvi and C. Vallat-Azouvi), Grenoble University Hospital (F) (n = 24; P Pollack, C Ardouin and C. Mosca), Lausanne University Hospital (CH) (n = 9; C Bindschadler), Lay St Christophe Rehabilitation Center (F) (n = 3; M. Krier), Liège Department of Cognitive Sciences (B) (n = 19; T. Meulemans and V. Marquet), Lille Stroke Center University Hospital (F) (n = 26; D. Leys and M. Roussel), Nantes University Hospital (F) (n = 8; P. Renou and M. Vercelletto), Nice University Hospital (F) (n = 6; E. Michel and P. Robert), Nîmes University Hospital (F) (n = 15; P. Labauge and C. Franconie), Paris-La Salpêtrière University Hospital Neurology Department (F) (n = 18; B. Pillon and B. Dubois), Paris-La Salpêtrière University Hospital Geriatrics Department (F) (n = 13; B. Dieudonnée and M. Verny), Paris-Broca University Hospital (F) (n = 5; H. Lenoir and J. De Rotrou), Rouen University Hospital (F) (n = 56; D. Hannequin and S. Bioux), Sion Rehabilitation Clinic (CH) (n = 12; J. Fuchs, A. Bellmann and P. Vuadens).

    View full text