Introduction

As is the case for all psychiatric disorders, the diagnosis of attention deficit hyperactivity disorder (ADHD) is not based on a specific pathological agent, such as a microbe, a toxin, or a genetic mutation, but rather on the collection of signs and symptoms and evidence of impairment that occur together more frequently than expected by chance (Todd et al. 2005). The presence of these symptoms is usually established by direct observation, or by the completion of a clinical interview or questionnaire by the parent or teacher of a child. Instruments vary with respect to the included symptoms, the exact manner of data collection (checklist or interview), and the response format (e.g., yes/no versus Likert scale). In the present paper, we investigated if (co)variance of the scores on different instruments can be explained by a common underlying construct and to what extent this common factor is influenced by genetic and environmental factors. The focus is on three widely used instruments: the Child Behavior Checklist (CBCL; Achenbach 1991), the Conners Parent Rating Scale-Revised:Short version (CPRS-R:S; Conners 2001), and the Diagnostic and Statistical Manual of Mental Disorders-4th edition (DSM-IV; American Psychiatric Association 1994).

The CBCL-Attention Problem scale (CBCL-AP) was developed by means of factor analyses, and includes eleven items. The psychometric properties and methods to establish the reliability of the syndrome are discussed in detail elsewhere (Achenbach 1991). Despite its name, the scale assesses problems related both to attention and hyperactivity. The CBCL has sex- and age-specific norms, which are useful in assessing a child’s risk for ADHD. The CPRS-R:S ADHD-index comprises the 12 best items for distinguishing children with ADHD from children without ADHD as assessed by the DSM (Conners 2001). As with the CBCL, sex- and age-specific norm scores are available, allowing the clinician to determine whether a given child is at risk for ADHD. DSM-IV ADHD is assessed on the basis of 18 symptoms; nine relate to inattention, and nine relate to hyperactivity/impulsivity. In the DSM framework, ADHD is viewed as a categorical trait; i.e., children either do or do not meet criteria for ADHD. The norms for clinical diagnosis do not vary as a function of sex or age of the child. Table 1 contains the symptoms included in the CBCL-AP scale, the CPRS-R:S ADHD-index and DSM-IV ADHD.

Table 1 An overview of the Child Behaviour Checklist (CBCL), Conners Parent Rating Scale-Revised:Short version (CPRS-R:S), and the Diagnostic and Statistical Manual of Mental Disorders-4th edition symptoms

Although the CBCL, DSM, and CPRS-R:S focus on different symptoms, and are based on distinct assumptions, the scores of these instruments are strongly related. CBCL-AP scores predict the presence of ADHD (Gould et al. 1993; Chen et al. 1994; Eiraldi et al. 2000; Lengua et al. 2001; Sprafkin et al. 2002; Hudziak et al. 2004). In a non-referred sample enriched for ADHD, about 50% of the children with a high CBCL-AP score were diagnosed with ADHD compared to 3% of the children with a low CBCL-AP score (Derks et al. 2006). Although these results imply a good convergence between the CBCL and a DSM-IV interview, the relation is clearly less than perfect. The CPRS-R:S ADHD-I was developed for assessing children at risk for ADHD based on a DSM-IV diagnosis (Conners 2001). Conners (2001) showed that the CPRS-R:S ADHD-I is a good screening instrument for DSM-IV ADHD with a sensitivity of 100%, a specificity of 92.5%, and an overall correct classification rate of 96.3%. As far as we know, the relation between CBCL-AP and the CPRS-R:S ADHD-I has not been studied, but given that they are both related to DSM-IV ADHD, these are likely to be correlated.

Genetic studies of psychiatric disorders are complicated by the lack of clear diagnostic tests (Hudziak 2001). Heritability estimates in epidemiological genetic studies, and the results of gene-finding studies may depend on the exact instrument that is used to assess ADHD. Although a number of papers have established the convergence between CBCL-AP and DSM-IV ADHD, the causal factors underpinning this relationship remain unclear. Is it the result of genetic overlap, environmental overlap, or both? This is an important question, which may determine the progress in gene finding studies. If variance in alternative measures of ADHD is explained by different genes, we would expect disagreement in the results of studies using different instruments. If the same genes explain variance in these measures, the data from studies using different instruments may be combined in order to increase statistical power (Boomsma 1996; Boomsma and Dolan 1998). Assuming that the convergence between different instruments will be less than perfect, part of the variance will be attributable by instrument-specific factors. It is important to investigate the nature of such factors. If the divergence among instruments is merely a matter of measurement error, we would expect no genetic influences on the instrument-specific factors. Genetic influences on the instrument-specific factors, on the other hand, would suggest that the instruments tap partly unique aspects of children’s behavior.

Genetic and environmental influences on individual differences in behavior can be studied in genetically informative designs, such as the classical twin design. Such studies have shown that genetic influences explain between 55 and 89% of the variance in clinical diagnoses of ADHD (Eaves et al. 1997; Sherman et al. 1997). Shared environmental influences were nearly always absent. Likewise, about 70–80% of the variance in CBCL-AP scores is explained by genetic influences. The remaining variance is explained by non-shared environmental influences (Rietveld et al. 2003; Hudziak et al. 2000; Gjone et al. 1996). Kuntsi and Stevenson (2001) used the Conners Rating Scale to assess symptoms of ADHD and reported a heritability of 72%. A review of genetic studies on AP, HI and ADHD suggested the absence of qualitative and quantitative sex differences in the genetic etiology of parent ratings of ADHD (Derks et al. in press).

Interestingly, in parent ratings, but not in teacher ratings, the DZ twin concordances and correlations are lower than would be expected under a purely additive genetic model. For example, in maternal structured interview reports, the concordance rate is .67 in MZ twins, but .00 in DZ twins (Sherman et al. 1997). Similarly, in CBCL ratings, the DZ twin correlations are less than half the MZ correlations (Rietveld et al. 2003). In the literature, two explanations are offered for these low DZ correlations. Firstly, the DZ correlation can be less than half the MZ correlation due to the presence of non-additive genetic effects (i.e., genetic dominance) (Lynch and Walsh 1998). Secondly, the low DZ correlation may be explained by social interaction effects, which may be the result of interaction among siblings (i.e., the behavior of a twin influences the behavior of the other twin) or rater bias (i.e., the behavior of a twin is compared to the behavior of the other twin) (Eaves 1976; Carey 1986; Boomsma 2005). In previous studies, support was found both for the presence of genetic dominance (Rietveld et al. 2003; Martin et al. 2002) and sibling interaction (Simonoff et al. 1998; Kuntsi and Stevenson 2001; Vierikko et al. 2004; Eaves et al. 1997).

A high heritability of attention problems and ADHD has been reported, irrespective of the instrument that is used. However, based on the findings of univariate studies, we cannot conclude that CBCL, Conners Rating Scale, and DSM ratings measure the same construct, or that they are influenced by the same set of genes. To address this question, multivariate analyses are needed. Although a number of studies have focused on the genetic and environmental influences on either AP or ADHD, only the study of Nadder and Silberg (2001) included multivariate analyses. Nadder and Silberg (2001) analyzed data obtained in a sample of 735 male and 819 female same-sex twin pairs, aged 8–16 years. They modelled the genetic influences on nine measures of ADHD symptomatology, including maternal and paternal DSM-III-R interview data (three dimensions: hyperactivity, inattention and impulsivity), maternal questionnaire data (the Rutter Parental Scale, and the CBCL), and a questionnaire completed by the twin’s teacher. The aim of this study was to determine whether overactivity, inattention, and impulsivity reflect the same underlying genetic liability, while taking method (i.e., instrument-specific) variance into account. In males, 23.7–70.1% of the genetic variance was explained by a common factor that loaded on all nine indicators. A second and third factor loaded on the three dimensions of the maternal and paternal interview data, respectively. The remaining variance (0.0–65.7%) was explained by factors that were specific to each measure. In females, there was also one factor common to all indicators (explaining 16.2–60.2% of the variance), and a second and third factor, which loaded on the three dimensions of the interview data. In contrast to the males, a fourth factor loaded on the three behavioral questionnaires. This factor explained 12.3–46.2% of the genetic variance. In total, measurement specific factors explained 0.0–73.0% of the genetic variance.

The purpose of the present paper is to investigate the construct validity of CBCL-AP, CPRS-R:S ADHD-I, and DSM-IV ADHD. Three questions are addressed. First, what are the phenotypic correlations between the three instruments? Second, do the instruments reflect a common underlying factor? Third, what are the genetic and environmental influences on the common and the instrument-specific factors?

Methods

Subjects

This study is part of an ongoing longitudinal twin study in the Netherlands. The subjects were all registered at birth with the Netherlands Twin Register (Boomsma et al. 2002, 2006; Bartels et al. 2007). Mothers of the registered twin pairs receive the CBCL and the CPRS at the ages 7, 10, and 12 years. A subsample of the twins was selected based on their longitudinal CBCL scores. The mothers of these pairs completed a diagnostic interview. The twins, with an age range of 10–13 years (mean age = 11.71; SD = .77) at the time of the interview, were born between 1989 and 1994. The mean time-span between the completion of the interview and the questionnaires was 4.42 (SD = .75), 1.82 (SD = .73), and −.84 (SD = .63) years for the questionnaires completed at age 7, age 10, and age 12, respectively.

Questionnaires were sent to all families that agreed to participate with the research of the Netherlands Twin Registry when the children were born (N = 7,828 families; birth cohorts 1989–1994) at the ages 7, 10, and 12 years. At least one measurement is available for 10,916 twins from 5,458 families, so the response rate is 70%. CBCL ratings were available in 10,018 twins at age 7, 6,565 twins at age 10, and 5,780 twins at age 12. CPRS-R:S ratings were available for 4,887 twins at age 12, and DSM-IV interviews were available for 1,006 twins. Complete data were available in 740 twins. The fact that the number of CPRS-R:S ratings is lower than the number of CBCL ratings, can be explained by the fact that the CPRS-R:S was not included for children born before 1991. The number of available questionnaires decreases over time as a result of the longitudinal character of the study (i.e., a number of children in the study had yet to reach the age of 12).

Zygosity diagnosis was based on DNA in 674 same-sex twin pairs. In the remaining same-sex pairs, zygosity was assessed using a 10–item questionnaire. Zygosity determination using this questionnaire is almost 95% accurate (Rietveld et al. 2000). Of the 5,458 twin pairs, there were 898 monozygotic male (MZM) pairs, 888 dizygotic male (DZM) pairs, 1,005 monozygotic female (MZF) pairs, 844 dizygotic female (DZF) pairs, and 1,823 dizygotic opposite sex (DOS) pairs.

Selection for the diagnostic interview

For the diagnostic interview, subjects were selected on the basis of their standardized maternal CBCL ratings (T-scores; mean = 50, SD = 10) at the ages 7, 10, and 12 years (Derks et al. 2006). Subjects were excluded if maternal ratings were available at only one time-point, or if they suffered from a severe handicap, which disrupted daily functioning. Twin pairs were selected if at least one of the twins scored high on AP (affected pairs), or if both twins scored low on AP (control pairs). A high score was defined as a T-score above 60 at all available time-points (age 7, 10, and 12 years) and a T-score above 65 at least once. A low score was defined as a T-score below 55 at all available time-points. The control pairs were matched with the affected pairs on the basis of sex, cohort, maternal age, and social economic status (SES). T-scores were computed in boys and girls separately. In other words, girls were selected if they scored low or high compared to other girls, and boys were selected if they scored low or high compared to other boys. This procedure resulted in the selection of similar numbers of boys (N = 499) and girls (N = 507).

Measures

The Child Behavior Checklist (CBCL) (Achenbach 1991) is a standardized questionnaire designed for parents to report the frequency and intensity of their children’s behavioral and emotional problems as exhibited in the past 6 months. It consists of 120 items that measure problem behavior. The items are rated on a 3-point scale ranging from “not true = 0”, “somewhat or sometimes true = 1”, to “very true or often true = 2”. The Attention Problem scale contains 11 items. The 2-week test–retest correlation and the internal consistency of this scale are .83 and .67, respectively (Verhulst et al. 1996). In the statistical analyses, we included the CBCL ratings at the ages 7, 10, and 12 years in order to correct for the selection, as explained below.

The Conners’ Parent Rating Scale-Revised is a widely used instrument to assess behavior problems in the past month (CPRS-R; Conners 2001; Conners et al. 1998). The short version contains 28 items. The items are rated on a 4-point scale ranging from “not true at all = 0” to “very much true = 3”. The CPRS-R:S ADHD-I, which was used in the present study, comprises the best 12 items for distinguishing children with ADHD from children without ADHD as assessed by the DSM-IV (American Psychiatric Association 1994; Conners 2001). The internal consistency of this scale at age 12–14 years is .94 in boys and .91 in girls. The 6–8 weeks test–retest correlation is .72. The Diagnostic Interview Schedule for Children (DISC) (Shaffer et al. 1993) is a structured diagnostic interview. It can be used to assess the presence of DSM-IV diagnoses, including ADHD. The Dutch translation is by Ferdinand and van der Ende (1998). The mothers of twins were interviewed by ten experienced research assistants to determine which symptoms of ADHD were displayed by the twins during the last year. Maternal ratings of DISC symptoms in their children were assessed by the same interviewer for each twin in a given pair. We analyzed the total number of symptoms.

Statistical analyses

Transformation to categorical data

The distributions of the CBCL, CPRS-R:S, and DSM symptom data are characterized by excessive skewness and kurtosis. Derks et al. (2004) showed that bias in parameter estimates due to non-normality of the data may be avoided by using categorical data analysis. In this approach, a liability threshold model is applied to the ordinal scores (Lynch and Walsh 1998). It is assumed that a person is “unaffected”, if his or her liability is below a certain threshold, and that he or she is “affected”, if his or her liability is above this threshold. In the present paper, the scores were recoded in such a way that three thresholds divide the latent liability distribution into four categories, of about equal size. The liability threshold model was identified by constraining the variance of the observed variables at 1.

The CBCL AP score was calculated by summing the responses on the 11 items which resulted in a sum score with a possible maximum of 22. The four categories consisted of a score of 0, 1–2, 3–5, and 6 or higher, respectively. The CPRS-R:S ADHD-I score was calculated by summing the responses on the 12 items, which resulted in a sum score with a possible maximum of 36. The four categories consisted of a score of 0–1, 2–5, 6–11, and 12, or higher, respectively. The DISC sumscore with a range of 0 to 18 was transformed into an ordinal variable with four categories. The four categories were: (i) not affected (0 symptoms); (ii) mildly affected (1–2 symptoms); (iii) moderately affected (3–5 symptoms); and (iv) highly affected (more than 6 symptoms). The use of this four category variable provides greater resolution, and so better statistical power than the use of a dichotomous variable (ADHD absent versus ADHD present).

Correcting for the selection

Diagnostic interview data were collected only in a subsample of the twins. The probability of selection for the interview depends on a measured variable, namely the twin’s CBCL scores at age 7, 10, and 12. The data of the complete sample may be partitioned as the observed (selected) and missing (unselected) parts. The data are missing at random (MAR) if the probability of missingness depends only on the observed part of the data, and not on the missing part (Little and Rubin 2002). Given that the data are MAR, unbiased parameter estimates can be obtained by full information (i.e., raw data) maximum likelihood estimation of the parameters in a statistical model that includes the variables that were used for selection. It is essential to include all variables that were used for selection, because the probability of missingness should not depend on the missing part of the data, in which case the data would be missing not at random (MNAR) and parameter estimates would be biased. We therefore included the CBCL ratings obtained at the ages 7, 10, and 12 years in the statistical analyses. All twin pairs in which at least one measure is available are included in the analyses.

Prevalences

To investigate if the prevalences of AP and ADHD depend on the twin’s sex or zygosity, we performed χ2-tests with the five ordinal measures as dependent variables and sex and zygosity as independent variables.

Genetic modeling

Genetic and environmental influences on variance in ADHD scores were estimated using structural equation modeling. All model fitting was performed on raw data with Mx (Neale et al. 2003), a statistical software package well suited for conducting genetic analyses.

The influence of the relative contributions of genetic and environmental factors to individual differences in ADHD can be inferred from the differences in correlations across MZ and DZ twin pairs, as MZ and DZ twins differ in their genetic relatedness (Plomin et al. 2001). Using the twin method, phenotypic variance may be attributed to additive genetic effects (A), dominant genetic effects (D) or shared environmental effects (C), and non-shared environmental (E) effects. The genetic effects (A and D) correlate 1 in MZ twins, as they are genetically identical. In DZ twins, A correlates .5, and D correlates .25. C correlates 1 in both MZ and DZ twins. E or non-shared environmental effects are, by definition, uncorrelated. Uncorrelated measurement error, if present, is absorbed in the E term. Note that estimating C and D at the same time is not possible in a design using only data from MZ and DZ twins reared together. If the correlations of DZ twins are less than half the correlations of MZ twins, which is the case for maternal ratings of attention problems and ADHD, D is included in the genetic model. The proportion of the variance accounted for by heritability or environmental influences is calculated by calculating the ratio of variance due to A, D, or E to the total phenotypic variance. For instance, let a, d, and e denote the regression coefficients in the regression of the phenotype on the standardized latent variables A, D, and E, respectively. The variance due to A is then a2, and the (narrow-sense) heritability is calculated as a2/(a2 + d2 + e2).

Social interactions may be an additional source of variance. Social interaction effects lead to differences in variances in MZ and DZ twins in continuous data (Carey 1986). Using ordinal data, the presence of an interaction component can be tested by equating the prevalences of AP/ADHD between MZ and DZ twins. The absence of significant prevalence differences suggests that the presence of sibling interaction or rater bias is considered implausible.

Three multivariate models were tested: a triangular (Cholesky) decomposition, an independent pathway model, and a common pathway model (Neale and Cardon 1992). The triangular decomposition is the least restrictive model, as no specific hypotheses regarding the covariance matrices of A, D, and E are tested. These matrices are merely assumed to be positive (semi) definite. This is a saturated model that can be used to obtain (otherwise unconstrained) genetic and environmental correlations among traits. In the independent pathway model, common and specific genetic and environmental factors are included. In our data analyses of the five variables, we specified a common factor and five instrument-specific factors for each of the factors A, D, and E, which we denote Ac, Dc, and Ec. An independent pathway model provides a good fit to the data if the covariance between the five variables is due to the common factors Ac, Dc, and Ec. Finally, in the common pathway model, a model that is nested under the independent pathway model, it is assumed that genes and environment explain variance in a latent phenotype. This latent factor, of which the variance is constrained at 1, explains variance in the five variables. In addition, the variance of the five variables is allowed to be influenced by instrument-specific influences of A, D, and E. In other words, the common pathway model would provide a good fit to the data if the covariance between the five variables can be explained by a latent construct.

Because the number of twins for whom interview data are available is relatively small, and sex differences in heritability are usually not found, the data from male and female twins were combined in the analyses. To allow for prevalence differences between boys and girls, sex was included as a covariate on the thresholds. The type-I error rate of all statistical tests was set at .05.

Results

Descriptives

The prevalences for the five measures were compared between MZ and DZ twins and between boys and girls. The first model that was fitted to the data was a fully saturated model. In this model, 90 correlations were estimated, 45 in MZ twins and 45 in DZ twins. In addition, the model included 30 thresholds in each of the following groups: MZ boys, DZ boys, MZ girls, and DZ girls, which results in a total of 120 estimated thresholds. Next, a model was fitted that included a number of constraints on the thresholds. This model included 30 thresholds, 1 sex effect on the thresholds, and 5 zygosity effects on the thresholds (one for each of the five measurements). As this model fitted the data well, it was used as the reference model to test for prevalence differences as a function of zygosity for each of the five measurements. The results of these analyses are summarized in Table 3. Zygosity did not affect the prevalences of the CBCL, CPRS-R:S, and DSM scores. In view of the absence of prevalence differences in MZ and DZ twins, social interaction effects were not included in the genetic model. The model that was used as the reference model to test for sex differences included as free parameters 30 thresholds, 1 zygosity effect on the thresholds, and five sex effects on the thresholds, one for each measurement. The results showed that boys have significantly more problems than girls on all five measurements; therefore, sex was included as a covariate on the thresholds. Because of the use of categorical scores in the present paper, we did not report means and standard deviations of the CBCL, CPRS-R:S and DSM scores. These descriptives can be requested from the corresponding author by interested readers.

Twin correlations

The polychoric correlations between the five measurements are shown in Table 2 for MZ and DZ twins. The MZ (DZ) twin correlations are reported above (below) the diagonal. As expected, the phenotypic correlations (i.e., the correlation between traits within the same individual) are similar in first- and second-born twins and in MZ and DZ twins. The correlations range from .45 to .77, with slightly lower correlations between different assessment methods (e.g., CBCL questionnaire versus clinical interview) than similar assessment methods (e.g., CBCL questionnaire versus CPRS-R:S questionnaire). Equating the correlations of first- and second-born twins at age 12, the phenotypic correlation between CBCL-AP and CPRS-R:S was .75, while the correlations between CBCL-AP and DSM, and CPRS-R:S and DSM were .62. The fact that the cross-twin and the cross-trait cross-twin correlations are higher in MZ than DZ twins indicates that genetic influences contribute to the variance of the three measures and to the covariance between them.

Table 2 Polychoric correlations in monozygotic (above diagonal) and dizygotic (below diagonal) twins

Genetic analyses

A Cholesky decomposition that included additive genetic influences (A), dominant genetic influences (D), and non-shared environmental influences (E) was fitted to the data. The full ADE cholesky decomposition fitted the data well (χ2(50) = 59.03, P = .180); see Table 3 for an overview of the model fitting results. Next, an independent pathway model was fitted to the data. Imposition of the independent pathway model for A, D, and E, resulted in a significant deterioration in fit compared to the fit of a cholesky decomposition (χ2(15) = 42.42, P < .001). Additional analyses showed that the influence of A and D were consistent with the independent pathway model, whereas the influence of E was not. A model that incorporated an independent pathway model for A and D, and a cholesky decomposition for E fitted well compared to the full cholesky decomposition (χ2(10) = 16.45, P = .087). The fit of the common factor model was poor (χ2(23) = 259.12, P < .001). Next, we tested if the instrument-specific influences of A and D could be constrained at zero. The instrument-specific additive genetic factors could not be dropped from the model (χ2(5) = 91.80, P < .001). In contrast, the dominant genetic variance could be explained by one common factor (χ2(5) = 1.06, P = .96). In other words, the covariance structure of D did not include specific variances. This means that this covariance matrix has rank one, and that the correlations (obtained by standardizing the covariance matrix of D) were all one. Figure 1 provides a graphical representation of the genetic part of the best fitting model and includes the unstandardized factor loadings of the additive genetic and dominant genetic factors.

Table 3 Multivariate model fitting of maternal ratings on CBCL, CPRS-R:S and DSM-IV ratings on attention problems and ADHD in 7-year-old children
Fig. 1
figure 1

A graphical representation of the unstandardized additive genetic (A) and dominant genetic (D) effects on five measurements of Attention Problems and ADHD. In this figure, a graphical representation of the best-fitting model and the estimated factor loadings is provided for one individual twin. Additive genetic effects correlate 1 in MZ twins and .5 in DZ twins. Dominant genetic effects correlate 1 in MZ twins and .25 in DZ twins. To identify the model, the variances of the five categorical measurements are constrained at 1. CBCL7 = Child Behavior Checklist at age 7; CBCL10 = Child Behavior Checklist at age 10; CBCL12 = Child Behavior Checklist at age 12; CPRS-R:S = Conners Parental Rating Scale-Revised:Short version at age 12; DSM = DISC-IV ADHD at a mean age of 12 years

Although the influence of the nonshared environment was not included in Fig. 1, the fact that the total variances of the five measurements are constrained at 1 in order to identify the model allows a calculation of the additive and dominant genetic variance based on the unstandardized factor loadings. For example, 41% (i.e., .442 + .462) of the variance in the CBCL rating at age 7 is attributable to additive genetic effects, 36% (.602) is attributable to dominant genetic effects. The remaining variance is explained by nonshared environmental effects. The additive genetic variance on the five measurements can be decomposed into variance due to the common factor and variance due to instrument-specific factors. For the CBCL rating at age 7, 19% (.442) of the total variance is attributable to common additive genetic effects, and 22% (.462) is attributable to instrument-specific genetic effects. The influence of common additive genetic effects on the total variance accounts for 36%, 55%, 56%, and 32% for the CBCL at age 10, the CBCL at age 12, the CPRS-R:S, and the DSM, respectively. Likewise, the influences of instrument-specific effects account for 17, 13, 23, and 24% of the variance, respectively.

Table 4 shows an overview of the standardized influences of A, D, and E on the variance and covariance of the five measurements. The three diagonals of the five by five tables of A, D, and E contained the standardized variance components. The results indicate a high heritability, irrespective of measurement instrument or age. On the off-diagonals in Table 4, one can find the standardized influences of A, D, and E on the covariance between the measurements. For example, the covariance between CBCL7 and DSM is for 51% explained by A, 25% by D, and 24% by E. To obtain the unstandardized amount of variances explained, the standardized influences should be multiplied with the phenotypic covariance between the measures, which is .51 for CBCL7 and DSM. The most interesting comparison is between the data that were collected at approximately the same time. The covariance between the CBCL at age 12 and the DSM is explained largely by genetic effects (68% A, 9% D, and 23% E). Similar results were found for the covariance between CPRS-R:S and the DSM (67% A and 7% D) and for the covariance between the CBCL age 12 and CPRS-R:S (74% A and 8% D).

Table 4 Standardized genetic and environmental influences on the variances and covariances of five ratings of ADHD and attention problems

Table 5 includes the genetic and environmental correlation matrices in the best-fitting model, which represent the overlap between the genetic and environmental influences on the five measurement instruments. The additive genetic correlations range between .52 and .76. All dominant genetic correlations are 1, which is a result of the absence of specifics in the one-factor model used to model the dominant genetic covariance structure. The non-shared environmental correlations range from .34 to .68.

Table 5 Genetic and environmental correlations of five ratings of ADHD and attention problems

Discussion

The aim of this study was to determine the extent to which three different instruments, which are commonly used to assess ADHD, attention problems, and hyperactivity, measure a common construct. The instruments considered are two scales based on items from questionnaires (CBCL-AP, and CPRS-R:S ADHD-I), and a DSM-IV ADHD interview. First, we considered the phenotypic correlations. Second, we tested if the variance in the different instruments reflects one common underlying factor. Third, we estimated the genetic and environmental influences on individual differences in ADHD. This is the first study that includes multivariate genetic analyses of behavior rating scales and DSM-IV interview data collected in a large sample of twins of approximately the same age. The CBCL scores collected at age 7 and 10 years were included only to correct for the selection. In the discussion, we focus mainly on the CBCL, CPRS-R:S and DSM interview data, which were collected at a mean age of 12 years.

The phenotypic correlation between CBCL-AP and the CPRS-R:S ADHD-I was high (r = .75). The correlations between the CBCL and the DSM and between the CPRS-R:S and the DSM were slightly lower (r = .62). These lower correlations can both be the result of the different time-points at which the behavior checklists and the DSM interview data were collected (the mean time-span between measurement occasions was 10 months), the differences in the time frame for the assessment of the items (e.g., 1 month for the CPRS-R:S, 6 months for the CBCL, and 1 year for the DSM), and of instrument or method variance (e.g., interview versus behavior checklists). The genetic analyses show that the covariance between CBCL and CPRS is for 82% explained by genetic factors while the covariance between CBCL and DSM was for 75% explained by genetic factors. Therefore, the higher phenotypic correlation between CBCL and CPRS is not caused by a relatively higher genetic covariance.

As noted, the AP scale of the CBCL questions relate to both inattention and hyperactivity/impulsivity. The fact that the correlation between the CPRS-R:S ADHD-I and DSM-IV ADHD is identical to the correlation between CBCL-AP and DSM-IV ADHD implies that the CPRS-R:S and the CBCL measure ADHD equally well. The description of the eleven item CBCL scale as an inattention scale seems to be too limited, because both the item content and the current results suggest that the CBCL also signals problems related to hyperactivity/impulsivity.

Although the phenotypic correlations provide an interesting insight regarding the similarities and dissimilarities of the quantitative and qualitative approaches towards child psychopathology, an important question concerns the etiological influences on the variances and covariances. In agreement with previous studies (Eaves et al. 1997; Sherman et al. 1997; Rietveld et al. 2003; Hudziak et al. 2000), individual differences in AP and ADHD are mainly explained by genetic factors. An independent pathway model provided a better fit than a common factor model. A common factor model implies a similar structure for the additive genetic, dominant genetic and nonshared environmental influences so the poor fit is probably due to the fact that there are instrument-specific additive genetic factors while these are absent for the dominant genetic factors. As referees of earlier drafts of this paper noted, alternative models might be fit to our data; a model including three common factors (one loading on all ratings, a second loading on CBCL ratings, and a third loading on age 12 ratings, might offer a good solution, but because the structure of A, D, and E differ (with no rating-specific influences for D), we did not fit this model to our data.

An independent pathway model allows for the inclusion of common and instrument-specific genetic and environmental factors. The model that provided the best fit to the data included common additive and dominant genetic effects, instrument-specific additive genetic effects, and nonshared environmental effects. The relative influence of common and instrument-specific genetic effects varies by rating. Two third of the additive genetic variance of the CBCL ratings at age 10, age 12, and the CPRS-R:S rating at age 12, was explained by common effects. More specifically, Instrument-specific effects played a more important role in the CBCL ratings at age 7, and in the DSM ratings. For these ratings, the ratio of common and instrument-specific effects was about 50:50. Apparently, the overlapping genes explain less of the variance in these ratings compared to the other ratings, probably as a result of developmental changes in behavior and of method variance (i.e., questionnaire versus interview). The dominant genetic effects overlapped completely between ratings, as the instrument-specific effects could be dropped from the model. Our results show some agreement with the findings of Nadder and Silberg (2001), who fit an independent pathway model to ADHD symptomatology based on maternal and paternal questionnaire and interview data, and to teacher reports. Although their best-fitting model included contrast effects instead of genetic dominance, our finding of both common and specific genetic influences on the questionnaire and interview data on ADHD is supported by their results.

The poor fit of the common factor model suggests that the construct validity of the instruments is not perfect. However, it is interesting to consider the implications of the overlap between the sets of genes that explained variance in the three instruments. High genetic correlations imply that the detection of the specific genes that play a role for ADHD, does not depend much on the instrument that is used. At age 12, the additive genetic correlations of the CBCL, CPRS-R:S, and DSM varied between .63 and .76, while the dominant genetic correlations could be constrained at 1. The non-shared environmental correlations are also quite high, and vary between .45 and .68. The dominant genetic correlations of 1 suggest that there is a subset of genes whose effect is not instrument or age dependent. In contrast, the correlations of the additive genetic effects are high but less than perfect. This suggests that the influence of most genes with an additive effect are not sensitive to the particular instrument that is used, although there are some genes that explain variance only in a particular measurement (e.g., CBCL), but not in another (e.g., DSM).

What are the implications of the present findings for gene finding studies? Thus far, five groups have conducted genome-wide linkage scans in an attempt to find genomic regions which are involved in ADHD, and a number of regions that may be of interest have been identified. Linkage peaks with a LOD score above 2 (P < ∼.002) were reported at chromosomes 16p13 and 17p11 (Ogdie et al. 2003), chromosomes 7p and 15q (Bakker et al. 2003), chromosomes 4q, 8q, and 11q (Arcos-Burgos et al. 2004), chromosomes 5p and chromosome 12q (Hebebrand et al. 2006), and chromosomes 14q32 and 20q11 (Gayan et al. 2005). All these studies based diagnosis on DSM-IV (Ogdie et al. 2003; Bakker et al. 2003; Arcos-Burgos et al. 2004; Hebebrand et al. 2006) or DSM-III (Gayan et al. 2005) criteria. The discrepancy in the results of these five studies could be due to low statistical power. The present study showed that the genetic overlap between behavior checklist scores and the DSM-IV diagnosis of ADHD is high. This implies that the detection of genes, which play a role for ADHD, can be based on questionnaire scores, rather than diagnostic interviews. This will reduce the costs of collecting phenotypic data. Resources may then be reallocated to the collection of genotypic data. An increased number of subjects can be genotyped and the statistical power to detect a QTL will be increased.

Limitations

The results of this study should be interpreted bearing in mind the following limitations. First, further study is required to investigate if the results of the current study, which was based on a Dutch population sample, generalize to population samples outside the Netherlands. Second, clinical diagnoses were based on structured diagnostic interviews with the mother. The results may be different when the assessment of ADHD is based on expert clinical diagnoses. Third, no distinction was made between problems related to inattention and problems related to hyperactivity. Since the CBCL does not distinguish between inattention and hyperactivity (and probably the number of items is too small to reliably measure these two factors) we did not distinguish between the subscales. Fourth, we did not allow for sex differences in the genetic and environmental influences based on the results of univariate studies. Because of the increased statistical power in the multivariate model, it is possible that sex differences do exist. However, due to the categorical nature of the data, and the fact that some of the cells in the contingency tables contain few individuals in a two-group analysis, statistical problems will arise in a four-group analysis. Fifth, as a result of the categorical nature of the data, computational limitations prohibited inclusion of confidence intervals.

Clinical implications

Two general approaches towards the measurement of ADHD can be distinguished. In the DSM-IV framework, ADHD is viewed as a categorical trait. Using behavior checklists, children can show variation in a continuum from not affected at all to severely affected. The current study shows that variance in DSM-IV symptoms, the CBCL-AP scale, and the CPRS-R:S ADHD-I is explained mostly by genetic effects. The correlations between the genetic influences on variance in these three measurements of ADHD are high. This implies that different measurements tap the same genetic liability.