Introduction
The accurate measurement of cognitive capacity in children with intellectual disabilities (ID) is important for determining appropriate diagnosis, service eligibility, individual strengths and weaknesses, treatment and education planning, and for research studies on these populations that rely heavily on IQ as a critical variable of interest. ID is a disability, originating before the age of 18, characterized by significant limitations both in intellectual functioning and in adaptive behavior as expressed in conceptual, social, and practical adaptive skills (American Association of Intellectual and Developmental Disabilities;
www.aaidd.org). The Diagnostic and Statistical Manual of Mental Disorders (DSM-IV; American Psychiatric Association [
1]) classifies ID in the following degrees of severity based on adaptive functioning and IQ: Mild (50–55 to approximately 70; ~85% of the ID population), Moderate (35–40 to 50–55; ~10% of ID), Severe (20–25 to 35–40; 3%–4% of ID), and Profound (below 20 or 25; 1%–2% of ID) [
1]. Intellectual functioning is defined as IQ obtained by assessment with a standardized, individually administered intelligence test such as the Wechsler Intelligence Scales, the Stanford–Binet, or the Kaufman Assessment Battery. Although the DSM-IV includes classifications for more impaired individuals, it is very challenging to measure the IQ reliably and accurately in subjects with ID below the Mild range (IQ 50–70). Indeed, a major limitation of these tests is that they do not typically measure IQ below 40 or 50, and that subtest standardized scores, which contribute to the overall score, are highly subject to floor effects and poor estimates of true ability.
A further complication and limitation is that whereas IQ tests generally do not measure functioning below 4 standard deviations below average (IQ = 40), measures of adaptive behavior, such as the Vineland Adaptive Behavior Scales (VABS) [
2], typically have a standard score floor of 20 (over 5 standard deviations below average), making comparisons between cognitive capacity and daily functioning impossible for these individuals. The lack of sensitivity of intelligence tests in this range of functioning is typically due to relative dearth of children with ID of varying levels of severity in the standardization samples, and limitations in the range of difficulty of test items and tasks that prevent measurement of lower levels of ability. Notably, test publishers have recently made some improvements in the normative sampling of lower functioning children (Stanford–Binet, Fifth Edition [
3]; Differential Ability Scales, Second Edition (DAS-II; [
4]), and one of these tests now has a lower IQ limit of 30 (DAS-II).
Clinical and research experience with intelligence testing in children with neurodevelopmental disorders shows that meaningful variation in performance is often obscured by flooring effects when raw scores are converted to standardized scores based on the normative data in test manuals. We can use the performance of two 15-year-old children with ID on the Wechsler Intelligence Scale for Children, Third Edition (WISC-III) and the VABS to illustrate this point (on both of these measures, IQ and VABS standardized scores have a mean of 100 and standard deviation of 15. On the WISC-III, subtest standardized scores have a mean of 10 and standard deviation of 3, with a range of 1 to 19). “Sam” is 15 years of age, speaks in one- to two-word utterances, receives a VABS Adaptive Behavior Composite (ABC) score of less than 20 (below the 0.1 percentile) and a Full Scale IQ (FSIQ) of 40 (the floor of the test). On the WISC-III Vocabulary subtest, for example, he obtains a raw score of 1 which converts to a standardized score of 1 (in response to “What is a clock?” he answers, “Time.”, and then has no further correct responses). “Joe” is a verbally fluent 15-year-old with a VABS ABC score of 60. He obtains a Vocabulary raw score of 16 and responds to questions with complex phrases or complete sentences; however his raw score also converts to a standardized score of 1, the same as Sam. Joe obtains a FSIQ of 42, just 2 points higher than Sam.
Floor effects and other measurement problems in intelligence testing with children with ID are common; however with a few exceptions such as those below, they are not often recognized or discussed in published studies. In a longitudinal study of a large sample of adults with mental retardation using the Wechsler Adult Intelligence Scale—Revised (WAIS-R), Facon [
5] reported mean IQ scores between 54 and 58 for four different age bands; however the scores and distributions were indicative of significant flooring effects that the authors acknowledged as a limitation in their discussion. In their analysis, the authors chose to use subtest raw scores instead of the standardized scores; they re-standardized the raw scores relative to their entire sample and summed these scores to create new composite verbal and performance scores for each subject. Another example comes from a study of 195 individuals with Down syndrome that were longitudinally assessed with the Stanford Binet, Fourth Edition. The authors reported that 37% of the available test results were assigned the lowest possible score of 36 [
6] but that these individuals demonstrated highly variable levels of performance despite flat standardized score profiles.
Our research centers have been studying individuals with FXS, the leading cause of inherited ID, for the past 25 years. FXS is a single gene disorder caused by a mutation in the fragile X mental retardation 1 (
FMR1) gene on the X chromosome at Xq27.3. This mutation results from a trinucleotide expansion preventing normal transcription, and leads to reduction or absence of the
FMR1 protein (FMRP) [
7,
8] and consequent abnormal brain development, including aberrant dendritic arborization and synaptic plasticity [
9‐
13]. In full mutation females, FMRP is usually expressed only by the normal allele carried on the active X chromosome. As a result, females tend to be higher functioning than males with FXS, although there is wide variability from significant ID to normal or above average IQ. Variable FMRP expression also results from mosaicism, where transcriptional silencing of the gene does not occur in all cells, either because of varying sizes of the repeat expansion or variation in methylation. Although more frequent in males, mosaicism also occurs in females with FXS. Individual differences in FMRP production in the brain as a result of these factors are thought to account for a significant proportion of the variability in IQ in individuals with FXS.
We have sought to understand the impact of gene function, brain function, and environmental variation on cognition and behavior in FXS, with the ultimate goal of identifying effective interventions based on this information. However, our research and clinical work has been significantly limited by a lack of IQ measurement sensitivity, as described above, in a substantial portion of individuals with this disorder. For example, in one study, designed to determine genetic and environmental factors contributing to IQ (as measured by the Wechsler scales), 43% of boys with FXS scored at the floor on all 12 subtests, and all of these children obtained a FSIQ of 40 [
14‐
16]. Although these individuals demonstrated considerable variability in their cognitive abilities and level of adaptive behavior [
15], their individual strengths and weaknesses and variation within the group were not reflected in their standardized scores. In an attempt to overcome this problem, in a recent study [
17] we abandoned standard scores altogether, and employed raw WISC-III subtest scores to examine the development of intellectual functioning in children with FXS. Using raw scores, and covarying for age, we found that intellectual functioning in children with FXS developed approximately two times slower than typically developing siblings over the age range of 6 to 16 years. While raw scores may offer significant advantages over standard scores (e.g., no floor effect, normal distribution of scores), the WISC-III manual does not contain raw subtest scores from the normative population. Thus, investigators cannot use raw subtest scores in their analyses without the inclusion of a well-matched comparison group.
Fragile X offers a unique opportunity to examine the sensitivity of intelligence testing in an ID population. The specific genetic etiology has been identified, the neuroanatomical morphology has been well-described, and the cognitive and behavioral phenotype is well known and relatively consistent. Although there are differences in FMRP expression in the brain compared to blood, the gene-dose of the mutation can be estimated by measurement of FMRP in lymphocytes. The degree of FMRP deficit can then be correlated with the cognitive deficit as measured by standardized testing [
18,
19]. Thus, FXS is a model for examining assumptions about measurement of cognition of individuals with mental impairment that can then be tested in other neurodevelopmental disorders (e.g. autism, Down syndrome) and more heterogeneous populations (e.g. children with idiopathic ID).
Here, we examined the sensitivity of the WISC-III, one of the most widely used intelligence tests, in a large sample of children and adolescents with FXS. First, we show the distribution of the usual standard scores in this sample of boys and girls. Next, we present a method for calculating new normalized scores representing each child’s actual deviation from the standardization sample, based on the raw score descriptive statistics obtained with permission from the publisher of the WISC-III (Psychological Corporation, San Antonio, TX). Finally, we compare the distribution of the normalized scores to the usual standardized scores, and correlate each of these with another measure of developmental level, the Vineland Adaptive Behavior Scales, and the degree of FMRP deficit.
Discussion
The results of this study, using the Wechsler Intelligence Scale for Children, highlight significant floor effects and restricted sensitivity as major limitations of standardized intelligence testing of children with fragile X syndrome, one of the most common causes of intellectual disability. Despite the preponderance of floored standardized scores (up to 70% of the sample), we demonstrate that substantial and meaningful variability in performance of lower functioning individuals is lost in the standardization of raw scores. We show that renormalized scores that are based on the individual’s actual deviation from the test normative data have a distribution and variability that is very much improved over the typical subtest standardized scores derived from norms tables. We show that relative to the usual subtest standardized scores, these normalized scores demonstrate more robust linear associations with a clinical measure of adaptive behavior (Vineland Adaptive Behavior Scale) and a genetic measure specific to FXS indicating the degree of
FMR1 protein deficiency. The normalized scores appear to provide a profile of relative strengths and weaknesses in lower functioning individuals that is not reflected in the usual standardized scores. On a group level, the normalized scores show a substantial deficit on the Arithmetic subtest, which is consistent with prior research highlighting this aspect of the FXS cognitive phenotype and its neuroanatomical basis [
26]. These results appear to have major clinical and research implications for intelligence testing of children with FXS and probably other types of ID. Although we have only documented this problem in one population and with one intelligence test, the WISC-III, the results suggest that cognitive tasks that are integral to the measurement of IQ can be sensitive to individual differences, even in very low functioning individuals.
The results of this study also have important research implications. IQ is an almost universal variable in developmental, neuroscience and genetic studies as an outcome of interest, a predictor variable, or as a critical tool for group matching. The use of IQ in lower functioning individuals, as currently derived by standardized tests, in such studies appears to lead to poor estimates of true level of cognitive ability and potential, an “even” profile that may obscure significant relative strengths and weaknesses, lower estimates of associations with other behavioral, biological, and genetic measures of interest, and samples that are inadequately matched on this dimension. Indeed, in FXS and perhaps other neurodevelopmental disorders, it will become increasingly important to utilize sensitive cognitive tests for tracking change as new targeted treatment trials are implemented.
The renormalization and improved sensitivity of intelligence testing for individuals with ID has implications for future research on the neuropsychology, neuroimaging, and genetic bases of neurodevelopmental disorders. For example, cognitive phenotyping studies and other research programs aimed at establishing links between genotype and specific cognitive patterns would greatly benefit from using individual scores that more accurately reflect the true deviation from normal as well as relative strengths and weaknesses. This has immediate implications for fragile X research as we develop and validate much more accurate measures of FMRP expression that could ultimately be used as prognostic indicators of developmental trajectory. In neuroimaging studies, efforts to determine the impact of brain morphological and functional abnormalities on neuropsychological deficits or relatively preserved abilities would also depend on cognitive scores that reflect the ability of individuals with ID (often in the experimental group) as accurately as those with typical development (often in the control group). Finally, from a study design perspective, it is important for many clinical studies of individuals with neurodevelopmental disorders to include comparison groups that are well-matched on cognitive ability so that results can be more confidently attributed specifically to the disorder in question and not confounded by more general developmental differences. A more accurate estimate of cognitive ability, as is presented here, would lead to improved matching and more powerful research designs. We emphasize that the concepts and methodological/statistical approaches proposed here may impact our ability to find other links between behavioral or cognitive phenotypes and biomarkers/genotypes.
Although children with ID represent a small proportion of the population, they should receive intellectual assessments that are as sensitive and valid as those available to children who are higher functioning. Many intelligence tests currently report performance of children in special categories, such as those with mental retardation, autism, or specific learning disabilities; however these data are primarily for validation study and are separate from the normative sample. An ambitious but worthwhile solution to the sensitivity problem is to over-sample children who are lower functioning in the standardization studies and include tasks that can be completed across a broader range of developmental levels, including items designed for children with a mental age extending into toddlerhood. An over-sampling of these children would yield enough normative data from children of varying levels of impairment, allowing a lower IQ floor. In the meantime, the publishers of widely-used standardized tests should consider releasing the raw data obtained from their standardization samples into the public domain so that more accurate estimates might be derived for lower functioning individuals, at least in research applications.
In summary, we show significant floor effects and lack of sensitivity of IQ measurement in children with FXS and mental impairment that can be substantially ameliorated by calculating each child’s actual deviation from the normative sample. The validity of this approach was accomplished by our demonstration of stronger associations between these new normalized scores and another measure of development and a genetic measure specific to FXS, in contrast to similar correlations with the traditional standardized scores. We hope that our observations and conclusions will lead to future studies examining the sensitivity of intelligence testing in other populations of children with neurodevelopmental disorders and to improved tools for measuring cognitive abilities and patterns of strengths and weaknesses in lower functioning individuals.