Research reportValidation of an integrated method for determining cognitive ability: Implications for routine assessments and clinical trials
Introduction
Given the importance of cognition in contemporary societies and its impact on health, accurate diagnosis of cognitive ability is critical. Cognitive ability is typically assessed in subjects from heterogeneous backgrounds, using a battery of cognitive tests covering language, visuospatial, memory, executive and general cognitive domains. Each test yields between 1 and 15 performance scores, which are dichotomized (normal vs impaired) according to norms ideally corrected as appropriate for age and education. Next, the dichotomized scores are integrated to form a clinical diagnosis. Despite major progress in this field, a survey of clinical practice in academic memory clinics and rehabilitation centers (Godefroy et al., 2004) and a review of published studies assessing preclinical (Bateman et al., 2012, Hultsch et al., 2000, Knopman et al., 2012, Tractenberg and Pietrzak, 2011) and mild cognitive impairment (Clark et al., 2013, Winblad et al., 2004), dementia (Dubois et al., 2010), stroke (Godefroy et al., 2011, Tatemichi et al., 1994), cardiac surgery (Moller et al., 1998, Murkin et al., 1995), multiple sclerosis (Rao, Leo, Bernardin, & Unverzagt, 1991; Sepulcre et al., 2006) and Parkinson's disease (Cooper, Sagar, Tidswell, & Jordan, 1994; Dalrymple-Alford et al., 2011, Litvan et al., 2012) showed that various methods are used to assess, dichotomize and integrate performance, with no reference to a validated gold standard (Brooks and Iverson, 2010, Crawford et al., 2007, Dalrymple-Alford et al., 2011, Lezak et al., 2004; Mungas, Marshall, Weldon, Haan, & Reed, 1996; Sepulcre et al., 2006). Several carefully designed studies have shown that the use of different criteria for impairment dramatically influences the estimated prevalence of cognitive impairment (Clark et al., 2013, Dalrymple-Alford et al., 2011, Sepulcre et al., 2006). Most importantly, a review of these studies failed to provide a rationale for determining the best criterion in the assessment of cognitive impairment. The absence of a standardized method undermines the reliable determination of cognitive status, which in turn has a major impact on both clinical practice and clinical research. This point is especially important because the objective of cognitive assessment has shifted towards the diagnosis of deficit of mild intensity or of selective deficit.
A systematic review of previous studies, of diagnostic criteria of cognitive impairment (e.g., Clark et al., 2013, Knopman et al., 2012, Winblad et al., 2004) and of available normative data of clinical battery shows that methodology differ in three critical respects: the dichotomization of performance, the integration of several dichotomized scores, and the possible use of a global summary score. The first issue concerns the cutoff criteria used to dichotomize performance. Most cutoffs are based on means and standard deviations (SDs) and use varying cutpoints from 1.5 to 1.98 SD. However, the effect of the deviation from normality of most cognitive scores is rarely addressed. Cutoff scores are sometimes based on percentiles, the 10th and 5th percentiles being the most frequently used. Second, cognitive assessment involves multiple tests, thus providing numerous scores. Procedures differ regarding the combination of tests and scores used as criterion of cognitive impairment. Some procedures consider that just one impaired test score is sufficient for classifying a subject as “impaired”, whereas others require “impaired” subjects to have at least two (or more) impaired test scores. Other procedures take into account the cognitive domain (each domain being assessed with one to several scores) and classify as impaired subjects with at least one or two or more impaired domain. In clinical practice, the interpretation is usually based on counting the number of impaired scores. Importantly, the use of multiple tests improves sensitivity but it can also artificially increase the false-positive rate (i.e., lowering the specificity) (Brooks and Iverson, 2010), a concern especially important as the scores are often inter-correlated (Crawford et al., 2007). This well-known redundancy artifact is addressed in trials at the stage of interpretation of statistical analyses using correction for multiple analyses, such as Bonferroni correction. However, this artifact has rarely been examined in the field of test battery interpretation, which typically involves 20–50 performance scores (Brooks and Iverson, 2010; Godefroy et al., 2010). Thus, there is currently no rationale for determining the optimal number of tests/scores for diagnostic accuracy (i.e., both sensitivity and specificity). Third, some trials have combined individual test scores into a global summary score. Various types of summary scores have been used: the number of impaired scores, the number of impaired domains, the mean scores for the various cognitive domains (e.g., language, visuospatial, memory, executive functions) and global scores (e.g., average of all cognitive scores after conversion of raw scores into a common metric, such as a z score). However, the influence of the use of a global summary score on diagnostic accuracy (and particularly its ability to detect a selective impairment) has not previously been analyzed. This review emphasizes the importance of examining the effects of various procedures and criteria on diagnostic accuracy and providing a rationale for optimization of these procedures and criteria.
The objective of this study, based on Standards for Reporting of Diagnostic Accuracy (STARD) guidelines (Bossuyt et al., 2003), was to describe the structure and to validate a framework for the analysis and integration of cognitive performance which provides optimal diagnosis accuracy (i.e., both sensitivity and specificity).
Section snippets
Population
This study was performed using the Groupe de Réflexion sur L'Evaluation des Fonctions EXécutives (GREFEX) database, which assessed executive functions in French-speaking participants (Godefroy et al., 2010). Briefly, the study included 724 controls (mean ± SD age: 49.5 ± 19.8; males: 44%; educational level: primary: 22%; secondary: 34%; higher: 44%) and a group of 461 patients (age: 50.4 ± 19.4; males: 54%; educational level: primary: 28%; secondary: 40%; higher: 32%) presenting various
Phase 1: determination of cutoff scores
All raw cognitive scores in controls deviated from normality (Supplementary Table 1). Most transformed scores were influenced by age, education or both. Using standardized residuals, the 5th percentile was found to provide the most appropriate cutoff point, as it is not influenced by the variation from a normal distribution. This is illustrated by cutoff scores computed from z scores at the 5th percentile level (Supplementary Table 1): most of the scores were above −1.64, which is indeed
Discussion
This study reports on comprehensive, comparative analyses of methods of key importance in the field of cognition. Our framework provides a generative, integrated methodology for the analysis and integration of cognitive performance and yields important, practical conclusions on the assessment of cognitive ability in both clinical practice and research. Although several studies have previously addressed each of the three above-described phases separately, there has been no assessment or
References (56)
- et al.
Bizarre responses, rule detection and frontal lobe lesions
Cortex
(1996) - et al.
Revising the definition of Alzheimer's disease: a new lexicon
Lancet Neurology
(2010) Syndromes frontaux et dysexécutifs
Revue Neurologique
(2004)- et al.
Vigilance and effects of fatigability, practice and motivation on simple reaction time tests in patients with lesion of the frontal lobe
Neuropsychologia
(1994) - et al.
Long-term postoperative cognitive dysfunction in the elderly ISPOCD1 study. ISPOCD investigators. International study of post-operative cognitive dysfunction
Lancet
(1998) - et al.
Statement of consensus on assessment of neurobehavioral outcomes after cardiac surgery
The Annals of Thoracic Surgery
(1995) A Modified Card Sorting test sensitive to frontal lobe defects
Cortex
(1976)- et al.
The multidimensional Random coefficients Multinomial Logit model
Applied Psychological Measurement
(1997) - et al.
Dementia and working memory
The Quarterly Journal of Experimental Psychology
(1986) - et al.
Dominantly Inherited Alzheimer Network. Clinical and biomarker changes in dominantly inherited Alzheimer's disease
The New England Journal of Medicine
(2012)
Standards for Reporting of Diagnostic Accuracy. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative
BMJ
An analysis of transformations
Journal of the Royal Statistical Society. Series B
An introduction to multidimensional measurement using Rasch models
Journal of Applied Measurement
Comparing actual to estimated base rates of “abnormal” scores on neuropsychological test batteries: implications for interpretation
Archives of Clinical Neuropsychology
Évocation lexicale formelle et sémantique chez des sujets normaux: Performances et dynamiques de production en fonction du sexe, de l’âge et du niveau d’étude
Acta Neurologica Belgica
Are empirically-derived subtypes of mild cognitive impairment consistent with conventional subtypes?
Journal of the International Neuropsychological Society
Slowed central processing in simple and go/no-go reaction time tasks in Parkinson's disease
Brain
Composite scores for executive function items: demographic heterogeneity and relationships with quantitative magnetic resonance imaging
Journal of the International Neuropsychological Society
Estimating the percentage of the population with abnormally low scores (or abnormally large score differences) on standardized neuropsychological test batteries: a generic method with applications
Neuropsychology
Characterizing mild cognitive impairment in Parkinson's disease
Mov Movement Disorders
Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach
Biometrics
Alzheimer's disease neuroimaging initiative. A composite score for executive functioning, validated in Alzheimer's Disease Neuroimaging Initiative (ADNI) participants with baseline mild cognitive impairment
Brain Imaging and Behavior
Frontal syndrome and disorders of executive functions
The Journal of Neurology
Dysexecutive syndrome: diagnostic criteria and validation study
Annals of Neurology
Vascular aphasias: main characteristics of patients hospitalized in acute stroke units
Stroke
Is the montreal cognitive assessment superior to the mini-mental state examination to detect post-stroke cognitive impairment? A study with neuropsychological evaluation
Stroke
Non-spatial attention disorders in patients with frontal or posterior brain damage
Brain
Neuropsychological changes related to unilateral lenticulostriate infarcts
Journal of Neurology, Neurosurgery, and Psychiatry
Cited by (42)
Development of criteria for cognitive dysfunction in post-COVID syndrome: the IC-CoDi-COVID approach
2023, Psychiatry ResearchDysexecutive disorders and their diagnosis: A position paper
2018, CortexCitation Excerpt :The increment in the number of tests (and consequently of scores) was found to increase sensitivity until a plateau is reached (Godefroy et al., 2014a). This improved sensitivity is obtained at the cost of specificity (with a 2.6% decrease of specificity per additional performance score in one study) (Godefroy et al., 2014a). Depending on the objective, the selection of a subset of tests from a battery can simplify the assessment and increase accuracy (i.e., both sensitivity and specificity) in clinical practice.
- 1
Groupe de Réflexion sur l'Evaluation des Fonctions Exécutives (GREFEX) study group: the following centers and investigators participated in the GREFEX cooperative study (n = number of patients included at each center; investigators): Amiens University Hospital (F) (n = 183; O. Godefroy and M. Roussel), Angers University Hospital (F) (n = 19; D. Le Gall), Heliomarin Rehabilitation Center Berck (F) (n = 15; C. Bertola), Bordeaux University Hospital (F) (n = 28; J.M. Giroire and P.A. Joseph), Saint Luc University Hospital Brussels (B) (n = 6; X. Seron, F. Coyette), Cholet General Hospital (F) (n = 8; E. Bretault and I. Bernard), Ottignies William Lennox Center (B) (n = 3; M. Leclercq), Garches University Hospital (F) (n = 9; P. Azouvi and C. Vallat-Azouvi), Grenoble University Hospital (F) (n = 24; P Pollack, C Ardouin and C. Mosca), Lausanne University Hospital (CH) (n = 9; C Bindschadler), Lay St Christophe Rehabilitation Center (F) (n = 3; M. Krier), Liège Department of Cognitive Sciences (B) (n = 19; T. Meulemans and V. Marquet), Lille Stroke Center University Hospital (F) (n = 26; D. Leys and M. Roussel), Nantes University Hospital (F) (n = 8; P. Renou and M. Vercelletto), Nice University Hospital (F) (n = 6; E. Michel and P. Robert), Nîmes University Hospital (F) (n = 15; P. Labauge and C. Franconie), Paris-La Salpêtrière University Hospital Neurology Department (F) (n = 18; B. Pillon and B. Dubois), Paris-La Salpêtrière University Hospital Geriatrics Department (F) (n = 13; B. Dieudonnée and M. Verny), Paris-Broca University Hospital (F) (n = 5; H. Lenoir and J. De Rotrou), Rouen University Hospital (F) (n = 56; D. Hannequin and S. Bioux), Sion Rehabilitation Clinic (CH) (n = 12; J. Fuchs, A. Bellmann and P. Vuadens).