Introduction

Eating disorders (EDs) are psychiatric conditions characterized by severe disturbances in eating behavior, and can be classified into three major types, namely anorexia nervosa (AN), bulimia nervosa (BN) and ED not otherwise specified. AN and BN are not mutually exclusive conditions as some individuals cross over between both conditions.1, 2 It is generally believed that ED has become more frequent over recent decades. Fairburn and Harrison3 summarized the prevalence of AN and BN to be 0.7% in teenage girls and 1–2% in 16–35-year-old females, and the incidence (per 100 000 per year) of AN and BN to be 19 in females and 2 in males, and 29 in females and 1 in males, respectively.3 AN has the highest mortality rate (5–6%) of any psychiatric disease,4 whereas the mortality rate of BN is reported to be 0.3%.5

The cause of ED is complex and poorly understood. However, the involvement of genetic factors in the etiology of ED has been demonstrated in family and twin studies.6, 7, 8, 9 Twin studies have estimated the contribution of genetic factors in AN to be between 33 and 84%9 and that in BN to be between 28 and 83%.8 To search for the genetic etiology of ED, two types of molecular genetic approaches, namely linkage studies and association studies, have been carried out (reviewed in Bulik et al.10 and Pinheiro et al.11). Linkage studies for AN have detected significant linkage at two regions on chromosome 112, 13 and an additional suggestive linkage at a number of loci.13, 14 From the 1p33-p36 region, one of the regions showing significant linkage to AN,12 serotonin 1D receptor (HTR1D) and opioid delta receptor (OPRD1) genes were further evaluated and found to exhibit significant association with AN.15 Linkage analysis of a BN cohort detected significant linkage at 10p13 and a suggestive linkage at 14q22.2-23.1.16 Fine mapping of these regions with significant/suggestive linkage signals will clarify whether the regions contain gene(s) relevant to AN or BN. Association studies for ED have so far been limited to candidate gene approaches that focused on the genes, such as those involved in the regulation of feeding and body composition and those implicated in neurotransmitter pathways regulating behavior.10, 11, 17 Although many association studies performed for ED are considered to be statistically underpowered because of their small sample sizes and/or suffer from multiple testing,10 positive findings on HTR1D, OPRD1 and BDNF genes seem promising, as their association with AN/BN has been replicated in more than one study in which relatively large numbers of samples were enrolled (reviewed in Bulik et al.10).

Completion of the Human Genome Project18 and the rapid progress of the International HapMap Project19 dramatically increased the amount of information on genetic markers, such as microsatellite (MS) markers and single-nucleotide polymorphisms (SNPs). Consequently, statistical strategies and genotyping platforms for genome-wide association studies (GWASs) have been established and prevailed as a means of identifying disease susceptibility genes. Disease association studies using MS markers distributed across the human genome have advantages over linkage analysis and the candidate gene approach. MS markers are highly polymorphic and show a high degree of heterozygosity (70% on average), and their linkage disequilibrium (LD) lengths are in the 100 kb range.20, 21, 22

In this study, we adopted a practical and efficient GWAS strategy for AN using a set of 23 465 MS markers22, 23 and the DNA pooling method, which has been adopted to identify novel susceptibility genes of rheumatoid arthritis22 and candidate loci for hypertension24 and adult height.23 We identified 10 novel loci related to AN by the MS marker-based GWAS strategy, and subsequently conducted an SNP-based association analysis for 7 of the 10 loci to further narrow down candidate genomic intervals responsible for AN susceptibility.

Materials and methods

Subjects

The patients enrolled in this study were recruited through the efforts of the Japanese Genetic Research Group for Eating Disorders (JGRED), which comprises 67 nation-wide hospitals/institutions (the full list is available in Supplementary Table 1). A total of 456 unrelated Japanese female patients with ED (331 cases with AN and 125 cases with BN) participated in this study. According to the Diagnostic and Statistical Manual of Mental Disorders,25 218 and 113 cases were diagnosed as AN-restricting type (AN-R) and as AN with binge eating/purging type (AN-BP), respectively. Among the 125 BN cases, 46 had histories of AN and 79 cases did not. The average age at assessment was 23.2±7.91 (s.d.) years for AN-R, 25.7±7.22 years for AN-BP and 26.3±6.77 years for BN. The lifetime minimum body mass index was 12.4±1.90 (s.d.) kg m−2 for AN-R, 12.8±2.67 kg m−2 for AN-BP and 16.2±3.47 kg m−2 for BN. A total of 872 Japanese healthy individuals participated in this study: 180 female individuals whose average age was 34.5 years (control group 1) and 692 male and female volunteers recruited among university students (control group 2). The average age and gender ratio of control group 2 were unavailable. The ethics committees of all facilities approved the investigation. All subjects gave their written informed consent before participation in the study.

Pooled DNA construction and MS Genotyping

Among the 27 037 MS markers developed by Tamiya et al.,22 23 465 (with average spacing 118.0-kb) were used in this study. Detailed information on the 27 037 MS markers is also available at the Japan Biological Information Research Center website (http://jbirc.jbic.or.jp/gdbs/database/viewer/download/list.jsp). The pooled DNA method for MS typing was performed according to the protocol of Collins et al.26 with a slight modification.21 Genomic DNA was extracted from peripheral blood using the Qiagen blood mini kit (Qiagen, Hilden, Germany). DNA concentration was determined using PicoGreen fluorescence assay (Molecular Probes, Eugene, OR, USA) as described previously.22, 26 The detailed conditions for PCR amplification and peak detection for pooled DNA and individual genotyping were described previously.22, 24 A total of 320 AN cases and 341 controls were subjected to MS genotyping. The number of subjects for pooled DNA typing was 90 AN-R cases and 90 female controls from control group 1 in the first-stage screening, and another 90 AN-R cases and another 90 female controls from control group 1 in the second-stage screening. The number of subjects for individual genotyping in the third screening was 140 AN cases (composed of 32 AN-R and 108 AN-BP cases) and 161 controls from control group 2. As the final step of MS screening, positive markers were subjected to individual genotyping of all 320 AN cases and 341 controls used in the three screening stages. MS markers showing statistical significance (P<0.05) in Fisher's exact test for the 2 × 2 contingency table in the first screening were subjected to the second screening. In the second and third screening stages, besides statistical significance (P<0.1), consistency in the directions of effect of the associated MS allele on AN susceptibility was considered; when the allele holding statistical significance showed the opposite direction of effect compared with that in the previous screening stage, such a marker was excluded from further analysis.

SNP genotyping

Single-nucleotide polymorphisms in candidate regions were selected from the SNP database (http://www2.appliedbiosystems.com/) using SNPbrowser software 3.5 (Applied Biosystems, Foster City, CA, USA). SNPs were genotyped using TaqMan assays, which were carried out using standard protocols for ABI PRISM 7900HT Sequence Detection Systems (Applied Biosystems). A total of 331 AN cases, 125 BN cases and 872 control individuals were subjected to SNP genotyping. The 331 AN cases consisted of the 320 cases genotyped in the MS screening and 11 additional cases. The 872 controls consisted of 180 female controls from control group 1 and 692 controls from control group 2.

Statistical analyses

Allelic frequencies in pooled DNA genotyping were estimated from the height of peaks measured using the PickPeak and MultiPeaks programs (Applied Biosystems). To calculate allelic P-values from MS genotyping data, we used Fisher's exact test for the 2 × 2 contingency tables for each individual allele and for the 2 × m contingency tables for each locus, where m refers to the number of marker alleles observed in the genotyped population.

The power of each of the three-stage MS screenings was calculated using a ‘Genetic Power Calculator27 (http://pngu.mgh.harvard.edu/~purcell/gpc/)’ in the AN prevalence rate of 1%. In the three successive screening, the overall power of the study was 72 and 50% for detecting an AN-susceptible allele with a genotype relative risk of 2.0 and 1.8, respectively, under an additive model in log-odds scale when the susceptible allele (with a frequency of 0.2) and an AN-associated MS marker were in complete LD.

For SNP genotyping, disease associations were assessed by the χ2 test mainly using Haploview 4.0 software (http://www.broad.mit.edu/mpg/haploview/).28 As a multistep analysis was used, nominal P-values were corrected with 10 000 iterated permutations for a series of SNPs selected for each candidate genomic interval. The significance level for SNP association was set at 0.05 throughout the study. Haplotype association analysis (100 000 iterated permutations) was also performed using Haploview 4.0. LD blocks in each candidate locus were defined using the default algorithm, namely confidence intervals,29 of Haploview 4.0. SNPAlyze v7.0 software (Dynacom, Mobara, Japan) was used to perform LD calculation, haplotype inference, identification of haplotype-tagging SNP/MS and case–control haplotype analysis (100 000 iterated permutations) for combined data of MS and SNP genotypes.

Results

Phased genomic screens using DNA pools

In the first-stage screening, among the 23 465 MS markers subjected to genotyping, 1414 (6.0%) were considered to be statistically significant (P<0.05) and were subjected to the next screening stage. In the second-stage screening, among the 1414 markers tested, 158 satisfied our selection criteria: statistical significance (P<0.1) and the same direction of allelic effect on AN susceptibility between the first and the second screening results. In the third screening, among the 158 markers tested, 16 satisfied our selection criteria: statistical significance (P<0.1) and the same direction of allelic effect throughout the MS screening stages. We determined a significance threshold to control false-positive rates (nominal α=0.05) in the first stage of MS screening. In the second and third stages, considering that our sample sizes of cases and controls are not large, we set the significance thresholds (nominal α=0.1) to maintain the overall statistical power of the screening.

To determine the definite allele frequencies of the selected 16 MS markers, we performed individual genotyping on all the AN cases (n=320) and controls (n=341) used in the first to third screenings. Of the 16 markers, 10 showed a statistically significant difference by Fisher's exact test in the comparison between controls and the AN cohort (Table 1). After correction of multiple tests with the number of alleles, 7 of 10 markers remained statistically significant (Pc<0.05).

Table 1 Ten microsatellite markers showing statistically significant differences in the individual genotyping

SNP association analysis to narrow down the regions responsible for AN susceptibility

From the 10 MS markers that were found to be associated with AN, we selected 7 (shown in bold in Table 1), on the basis of the gene content around each marker, as targets for SNP association analysis to narrow down disease susceptibility loci. We primarily selected a collection of evenly spaced SNPs (11.1-kb interval on average) within a several 100-kb region surrounding each candidate MS marker, although it should be noted that intragenic SNPs were preferentially selected from the loci of 1q41 (spermatogenesis associated 17 (SPATA17)), 5q15 (CDH18) and 18q22 (NETO1). The number of SNPs subjected to association analysis and the size and nucleotide positions of the corresponding genomic interval are listed for each locus in Table 2.

Table 2 SNP allelic association with AN

In total, we performed genotyping for 333 SNPs on 331 AN cases and 872 controls. Among the 251 SNPs that satisfied thresholds for the Hardy–Weinberg equilibrium (exact test P>0.01) and minor allele frequency (>5%), 24 showed a statistically significant association (nominal P<0.05) with the AN cohort (Table 2). For each of the seven loci analyzed, nominal P-values were corrected with 10 000 iterated permutations using Haploview 4.0. In all, 3 SNPs, all of which are located on 1q41, out of the 24 SNPs remained statistically significant (Pc<0.05) (underlined in Table 2).

Subsequently, using Haploview 4.0, we inferred LD block structures for each candidate chromosomal region, and performed a haplotype association analysis (100 000 iterated permutations) for the constructed LD blocks. Significant association (Pc<0.05) with the AN cohort was detected in three of the six SNP haplotype blocks defined in the 1q41 locus, and in one of the eight blocks defined in the 11q22 locus (Figure 1 and Table 3).

Figure 1
figure 1

Single-nucleotide polymorphism (SNP) and haplotype association analyses for the 1q41 (left) and the 11q22 (right) loci. For each locus, the linkage disequilibrium (LD) plot (top), resided gene(s) (middle) and the P-value plot (bottom) are shown. In LD plots, the extent of LD between two SNPs is shown by the standard color scheme (D′/LOD) of Haploview 4.0. In P-value plots, closed dots show the minus log P-value (y axis) and the physical location (x axis) of SNPs. Minus log P-values were calculated by χ2 tests for the genotyping data of anorexia nervosa (AN) cases (n=331) and controls (n=872). The horizontal dashed line corresponds to the P-value of 0.05. Black and red horizontal bars above the P-value plots correspond to the LD blocks defined by the confidence intervals method (Haploview 4.0). LD blocks showing statistical significance in the haplotype association analysis (Pc<0.05 in Table 3) are shown by red bars. The rs numbers of the SNPs showing statistical significance (P<0.05 in Table 1), the SNPs binned to the AN-associated LD blocks and the SNPs at the ends of the genomic interval are shown underneath the P-value plot. The positions of AN-associated MS markers (D1S0562i and D11S0268i) are shown by blue rectangles. The SPATA17 gene and an uncharacterized mRNA sequence, BC040896, which are transcribed from left to right orientation, are mapped in the 337.3-kb interval between SNPs rs1930302 and rs1538555 on 1q41. For the 11q22 locus, the 474.6-kb region between SNPs rs2585885 and rs6590633, which are located in intron 2 and intron 16 of the CNTN5 gene (NM_014361), respectively, is shown.

Table 3 LD blocks in 1q41 and 11q22 loci and haplotype association analysis

1q41

A total of 38 SNPs were selected for genotyping within a 337.3-kb interval, including the AN-associated MS marker D1S0562i. Among 30 SNPs subjected to association analysis, 7 showed a statistically significant association (P<0.05) with the AN cohort (Table 2). All the seven SNPs were located at 3′-downstream of the SPATA17 gene (Figure 1, left). SNP rs2048332 showed the most significant association (allelic P=0.00023) and was further analyzed under different genetic models. Association analysis under a recessive model for rs2048332 showed the lowest P-value of 0.00015 with the CC genotype, indicating that the CC genotype of rs2048332 has a susceptible effect on the AN phenotype in the Japanese (odds ratio=1.73, confidence interval, 1.30–2.31). Among the three AN-associated haplotype blocks (1q41-#4, #5 and #6 in Figure 1, left and Table 3), 1q41-#5 that comprised two SNPs, namely rs1397178 and rs2048332, spanning a 10.2-kb interval, was found to be most significantly associated (Pc=0.0039).

The AN-associated MS marker D1S0562i was located in block 1q41-#6, which comprised five SNPs spanning a 38.1-kb interval, and was also associated with AN (Pc=0.038). Four of the five SNPs binned to haplotype block 1q41-#6, rs17691163, rs34418611, rs1934216 and rs1538555, were in a relatively strong pairwise LD (D′=0.72–0.75) with D1S0562i, whereas the most significant SNP, rs2048332, in 1q41-#5 block was in modest LD (D′=0.46) with it. The four SNPs and D1S0562i (rs17691163–D1S0562i–rs34418611–rs1934216–rs1538555) were selected as tags captured through LD in block 1q41-#6. These haplotype tags were subjected to an MS-SNP haplotype-based association analysis: one haplotype (G-2-A-T-G), tagged by an AN-associated risk allele of D1S0562i (Supplementary Table 2), was significantly associated with AN (Pc=0.0065) (Table 4). This association was comparable with those observed in the SNP-haplotype analysis in two haplotype blocks 1q41-#5 (Pc=0.0039) and 1q41-#6 (Pc=0.038) in terms of statistical significance.

Table 4 Case-control association analysis for MS-SNP haplotypes

11q22

A total of 66 SNPs were selected from a 699.8-kb interval surrounding the AN-associated MS marker D11S0268i. Among the 56 SNPs subjected to association analysis, 3 (rs12574821, rs1349782 and rs6590474) showed a statistically significant association (P<0.05) with the AN cohort (Table 2). These associated SNPs were found to be located in the eighth intron of the CNTN5 gene (GenBank accession no. NM_014361). Although these three SNPs did not hold statistical significance after multiple-testing correction by permutation tests, a haplotype composed of five SNPs (rs6590474, rs7129985, rs1901860, rs737582 and rs7947224) spanning a 20.2-kb interval showed a statistically significant association with AN (Pc=0.0082) in the haplotype association analysis (11q22-#5 in Figure 1, right and Table 3). Exon 9 of CNTN5 was included in the 20.2-kb interval.

The AN-associated MS marker, D11S0268i, was located in the AN-associated 11q22-#5 block. In this block, D11S0268i was in a strong pairwise LD (D′=0.81–0.90) with each of the five SNPs binned to this block. As three SNPs (rs6590474, rs737582 and rs7947224) and D11S0268i were selected as tags captured through LD in the 11q22-#5 block, we further conducted an MS-SNP haplotype-based association study within the block using these four markers (rs6590474, D11S0268i, rs737582 and rs7947224). As shown in Table 4, the A-4-G-T haplotype was overrepresented in AN cases with the greatest statistical significance (Pc=0.00003). This MS-SNP haplotype contained both of the significantly associated risk alleles (A allele and 4) at SNP rs6590474 (Table 2) and D11S0268i (Supplementary Table 3).

Assessment of possible gender effects in the detected association of 1q41 and 11q22 with AN

Owing to the limited number of female individuals whose age matches with the average age of the AN cases in our control samples, we adopted a population-based control group to search for AN susceptibility loci. Therefore, although 180 female controls (control group 1, average age: 34.5 years) were genotyped in the first and second stages of MS screening, an additional 692 control individuals enrolled in the later stages (161 individuals in the third stage of MS screening and all of 692 individuals in the SNP association analysis) consisted of male and female individuals. To assess whether the detected association of the 1q41 and 11q22 loci to AN was inflated by this partial mismatch in gender between case and control populations, we conducted a stratified analysis for 7 SNPs on 1q41 and for 3 SNPs on 11q22 (Table 2) that showed a significant association with AN. When control group 1 (180 females) and the AN cohort were subjected to association analysis, all 10 SNPs were detected to be associated (P<0.05) with AN (Supplementary Table 4). These results assure that the association of the 1q41 and 11q22 loci with AN detected in this study is not because of inflation caused by the inclusion of male individuals in the control population.

Association analysis for a BN cohort

To assess whether the genomic intervals identified to be associated with AN in this study are also involved in the genetic etiology of BN, we conducted an SNP association analysis for BN cases and controls. The 7 SNPs from the 1q41 locus and the 3 SNPs from the 11q22 locus, which showed a statistically significant association (P<0.05) with AN before multiple-testing correction, were subjected to SNP genotyping on the cohort of 125 BN cases. None of the 10 SNPs showed a statistically significant association with BN (data not shown).

Discussion

We have completed a genome-wide association analysis for AN using 23 465 MS markers. To our knowledge, this is the first GWAS performed for EDs. Among the 10 candidate loci we identified, 9 are reported to be associated with AN for the first time in this study. Only one locus, D1S0016i on 1p36, overlaps with the chromosomal region of 1p33-p36 that has already been reported to show significant linkage to AN.12

Through an SNP association analysis for the seven selected candidate regions to narrow down genomic intervals involved with AN susceptibility, we tentatively identified a 10.2-kb genomic interval (the haplotype block 1q41-#5) located at 3′-downstream of the SPATA17 gene as a region associated with AN. The MS-SNP haplotype-based association analysis also indicated the association of haplotype block 1q41-#6 with AN with a similar statistical significance. SPATA17 encodes a 361 amino-acid protein that contains three highly conserved IQ motifs and is strongly expressed in the testis.30 It is unknown whether the SPATA17 protein has any physiological roles in neuronal tissues. It should be noted that the 10.2-kb critical interval coincides with the exon–intron structure of the uncharacterized mRNA sequence BC040896, which is derived from a cDNA library made from brain (adult medulla) RNA.

Another genomic region identified to be associated with AN in this study is a 20.2-kb interval (the haplotype block 11q22-#5), spanning from the eighth to the ninth intron of the CNTN5 gene on 11q22. Furthermore, we found that one MS-SNP haplotype (A-4-G-T), which includes two AN-associated risk alleles at SNP rs6590474 and MS D11S0268i, was significantly overrepresented in AN cases. CNTN5 encodes a member of the contactin family known to function during the formation of neuronal interactions. It is reported that the mouse line deficient of Cntn1, another member of the contactin family, exhibits an ataxic and anorectic phenotype.31, 32 In human adult tissues examined using northern blot analysis, CNTN5 has been shown to be predominantly expressed in the brain and thyroid.33 In various regions of an adult brain examined, the gene was found to be expressed with highest levels in the occipital lobe and amygdala, followed by the cerebral cortex, frontal lobe, thalamus and temporal lobe.33 Although neuronal activity in the auditory system is reported to be impaired in the mouse line deficient for Cntn5, no anorexic phenotype has been described.34

Although causative SNPs are not yet determined, we have successfully mapped genetic association with AN to at least two genomic regions on 1q41 and 11q22 and narrowed down an AN-associated genomic interval for each locus by haplotype association analysis. Further replication analysis using independent patient/control populations for AN-associated SNPs and functional analyses for the genes or for particular genomic regions in these loci will better clarify the impact of these SNPs/genes in the genetic etiology of AN. It should also be noted that additional common variants are likely to have roles in the development of AN because this study was not well powered to detect susceptible loci with relatively small genetic risks. Additional gender/age-matched cohorts consisting of much larger numbers of cases and controls need to be used to improve statistical power in MS-based genome-wide association analysis.