Introduction

Schizophrenia (MIM 181500) is a severe neuropsychiatric disorder with overt onset in late adolescence or early adulthood. Its clinical manifestations include delusions, hallucinations and thought disorder (positive symptoms), lack of motivation and poor social and occupational adjustment (negative symptoms) as well as cognitive dysfunction (primarily impairment of executive functions). With a lifetime risk of 1%, schizophrenia is the leading cause of chronic psychiatric hospitalization and represents a major public health and economic burden. While the etiology of schizophrenia is not known, a substantial body of evidence supports a pivotal role for abnormalities of brain development in utero and postnatally.1, 2

It has long been hypothesized that genetic factors play a significant role in schizophrenia, with strong support from family, twin and adoption studies.3, 4, 5 Although linkage studies aimed at mapping schizophrenia susceptibility loci have not been fully consistent, a confluence of findings for certain chromosomal regions is gradually emerging6, 7 with support from meta-analyses.8, 9 Further efforts have focused on identifying susceptibility genes, mostly in regions previously implicated by linkage studies.10, 11, 12 There have been several findings that have been replicated in some but not all studies, most prominently DTNBP1, Neuregulin1, G72 and DAAO, RGS4, COMT11, 12 and DISC1.10, 13 While these genes have potential pathophysiological relevance to schizophrenia and in some cases a putative role in brain development, pathogenic variants have not been identified.

Recently, considerable interest has focused on the long arm of chromosome 6 where several studies have mapped putative schizophrenia susceptibility loci14, 15, 16, 17, 18, 19, 20 (for a review see Kohn and Lerer21). Furthermore, several groups have reported evidence for linkage of bipolar disorder on chromosome 6q22, 23, 24 with additional support from a recent meta-analysis.25 Given the extensive genomic distance spanned by these reports, it is feasible that the incriminated chromosome 6q region harbors more than one gene implicated in the pathogenesis of schizophrenia, bipolar disorder and possibly other neuropsychiatric phenotypes.21

Based on an autosomal scan of Arab-Israeli families, we previously reported linkage of schizophrenia to chromosome 6q23.20 After fine mapping,26 the peak nonparametric LOD score (NPL) was 4.98 (P=0.00000058) at D6S1626 (136.3 Mb) giving rise to a putative susceptibility region (NPL-1) of 3.90 Mb. In order to identify susceptibility genes within the region, we embarked on extensive genotyping of single nucleotide polymorphisms (SNPs) within and flanking 20 putative candidate genes selected for being expressed in brain and having potential pathophysiological relevance to schizophrenia, as well as in intergenic regions. We report the identification of a 500 kb genomic region that is significantly associated with schizophrenia even after correction for multiple testing. This region harbors two genes, the neurodevelopmentally implicated Abelson Helper Integration Site 1 gene (AHI1), and an adjacent, primate-specific transcript, C6orf217.

Subjects and methods

Family samples and diagnostic methods

The index family-based association sample for this study (TKT) consisted of 53 nuclear families of Arab-Israeli origin (190 individuals, 85 affected), 21 of whom were included in our genome scan,20, 26 32 were recruited subsequently. 34 were ‘triad’ families (affected proband plus both parents), 10 families had two affected offspring and nine had 3–5 affected. Families from the genome scan sample with more than one branch were represented in the family-based association sample by a single family unit. In two cases these family units were triads. The Hadassah-Hebrew University Medical Center Helsinki Committee (Internal Review Board) approved the project and all subjects gave written informed consent. Best-estimate consensus psychiatric diagnoses,27 according to the Research Diagnostic Criteria (RDC)28 and the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV),29 were based on interviews with the Schedule for Affective Disorders and Schizophrenia-Lifetime Version (SADS-L),30 the Family History Research Diagnostic Criteria (FH-RDC)31 and medical records. All diagnostic evaluations were completed without knowledge of the genotyping data. A broad diagnostic category was employed and this encompassed 65 subjects with schizophrenia, 17 with schizoaffective disorder and three with unspecified functional psychosis (RDC) (further details of these procedures are provided in ‘Detailed description of family sample and diagnostic methods’ (online only)).

The replication sample (BT) consisted of 209 nuclear families with a total of 678 individuals, 258 of whom were affected; 177 were triad families, 23 had two affected offspring and nine more than two (3–5). Owing to the preponderance of triad families this sample is not suitable for linkage analysis. The affected individuals were diagnosed with schizophrenia on the basis of interview with the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID,32) according to DSM-IV.29 This sample is also of Arab origin, but it is outbred and was recruited from different geographical regions of Israel and the Palestinian Authority. Reliability of diagnoses across the two samples was established by a diagnostic reliability study (described in full in ‘Detailed description of family sample and diagnostic methods’ (online only)).

Genotyping

Genotyping was performed without knowledge of clinical diagnosis. Information about SNPs was obtained from public databases (Ensembl release 29.35b, NCBI build 35) as well as from the Celera database. The Sequenom MassARRAY platform (Sequenom, San Diego, CA, USA) was used for high throughput genotyping. We used the protocol for high multiplex homogeneous MassEXTEND (hME) reactions (Sequenom, San Diego, CA, application notes). Genotyping assays were designed as multiplex reactions using SpectroDESIGNER software version 2.0.7 (Sequenom, San Diego CA) after checking that the SNPs do not reside in repetitive elements. The acquired genotypes were checked for deviations from Mendelian inheritance using the program, PedManager (http://www.broad.mit.edu/ftp/distribution/software/pedmanager/). Ambiguous markers were removed from the analysis in any specific family, if re-examination of the raw data did not resolve the ambiguity. The rate of genotyping errors was below 2% for all markers in the entire sample.

Statistical analysis

PBAT version 3.033, 34 (http://www.biostat.harvard.edu/~fbat/pbat.htm), which incorporates an extended and improved transmission disequilibrium test (TDT),35 was used for association analysis. PBAT statistics were calculated under the null hypothesis of linkage and no association using the sandwich option (sw) for robust estimation of the variance, conditioning on traits and parental genotypes. The null hypothesis of no linkage and no association was used for the replication sample. Haplotype analysis was restricted to adjacent SNPs. The mode of inheritance of schizophrenia is complex; therefore we chose the additive mode. We restricted the minimal number of informative families to 10 and set the minimal haplotype frequency cutoff to 0.05. Stringent Bonferroni correction for multiple testing was used separately for single SNP and haplotype analysis. Haploview Version 3.2 (http://www.broad.mit.edu/mpg/haploview/) was used to calculate intermarker linkage disequilibrium (LD) between all SNP pairs within a 1 Mb interval and to generate a graphical view of the LD pattern across the entire genomic region. Haplotype blocks were defined using the confidence interval algorithm.36

In silico prediction of transcription factor binding sites

The putative promoter region of genes AHI1 and C6orf217 was scanned for the presence of 337 human positional weight matrices (PWMs) downloaded from TRANSFAC.

The promoter scan was carried out via a Perl script that utilizes the TFBS module http://forkhead.cgb.ki.se/TFBS/.37 This module implements the PWM search algorithm described in.38 The search cutoff used was 92%.

Results

We genotyped 180 SNPs within a 13.9 Mb linkage interval in 53 families from an Arab Israeli sample that shows linkage to schizophrenia on chromosome 6q2320, 26 (Figure 1a). The region harbors 69 known genes, the majority showing brain expression (Figure 1b). Genotyped SNP density was highest in a 1 Mb area around the linkage peak at 136 Mb with a total of 58 SNPs at an average inter-SNP distance of 17.0 Kb (Figure 1c); the average density of the remaining SNPs was 66.6 kb. A graphical display of the LD pattern giving rise to 23 LD blocks is shown in Figure 1d. The underlying D′ and r2 values for SNP by SNP comparison across the entire region are available from the authors on request.

Figure 1
figure 1

Genomic region within the NPL-2 confidence interval of a linkage peak on chromosome 6q23 for fine mapping of a schizophrenia susceptibility locus in an Arab-Israeli sample. (a) Multipoint, nonparametric linkage analysis of microsatellite markers on chromosome 6q under a broad diagnostic model showing a maximum NPL of 4.98 (P=0.00000058) at 136.97 cM.26 The NPL-1 (3.9 Mb) and NPL-2 (20 Mb) confidence intervals are indicated by broken lines in shades of green. The linkage peak is represented by a green triangle. (b) The NPL-1 and NPL-2 confidence intervals are indicated by arrows in shades of green. The position of the linkage peak is depicted by a green triangle at 136.3 Mb. Distribution of known genes spanning a 14 Mb genomic region underneath the linkage peak is shown. Gene polarity is indicated by the orientation of the triangles. Yellow and grey triangles represent, respectively, genes covered by genotyped SNPs or devoid of genotyped SNPs. The scissors represent known recombination hotspots. The brown double-headed arrow represents the 500 kb high LD region around the AHI1 gene. Genes covered by genotyped SNPs are: a: TRAR4, b: MYB, c: AHI1, d: C6orf217, e: PDE7B, f: FAM54A, g: BCL2A, h: MAP7, i: MAP3K5, j: PEX7, k: IL22RA2, l: TNFAIP3, m: C6orf63, n: REPS1, o: HECA, p: NMBR, q: C6orf55, r: GPR126, s: EPM2A, t: GRM1 (c). Distribution of SNPs within the 14 Mb genomic region under the linkage peak. Thick black lines indicate dense clusters of SNPs. The SNP density is highest in a 1 Mb region from 135.5 to 136.5 Mb with an average inter-SNP distance of 17.0 kb, followed by the 7Mb genomic region from 136.5 to 143.5 Mb with an average inter-SNP distance of 66.6 kb. The remaining SNPs are distributed across 3 candidate genes, namely TRAR4 (a), EPM2A (s) and GRM1 (t). Positions of the 23 haplotype blocks relative to the LD plot are shown underneath, as well as the LD plot generated using HAPLOVIEW software with pairwise SNP comparison for SNPs less than 1 Mb apart. Red indicates regions of high LD and white represents regions of low LD.

Single SNP analysis revealed the highest association signal within a 0.5 Mb genomic region (135.6–136.1 Mb). This association signal included the AHI1 gene at 135.7 Mb, extending into the distal intergenic region between AHI1 and the Phosphodiesterase 7B gene (PDE7B) (Table 1, Figure 2a, Full genotyping results in Supplementary Table S1 (online)). It also included a primate-specific gene of unknown function, C6orf217, for which there is EST-cluster evidence (UniGene cluster HS.510098). This gene is directly adjacent to AHI1 and in high LD with it (Figure 3). After conservative Bonferroni correction for multiple testing (180 tests, P=0.00028), two out of the six SNPs within AHI1, remained significant. Within C6orf217 four SNPs withstood correction. The last significant SNP in this region (rs1475069, P=0.0000025) is located 48 kb distal to the last annotated exon of C6orf217 in the intergenic region between it and PDE7B. In addition to the cluster of associated SNPs within the high LD region there were four SNPs in other genomic locations that withstood Bonferroni correction (Figure 2a, Supplementary Table S1 (online)).

Table 1 Single SNP association results for the high-density region underneath the linkage peak (20 SNPs) on chromosome 6q, from 135.6 to 136.2 Mb in the TKT sample
Figure 2
figure 2

Association results within the NPL-2 confidence interval of a linkage peak on chromosome 6q23 of a schizophrenia-susceptibility-locus in an Arab-Israeli sample. (a) The SNP # corresponds to the numbering in Supplementary Table S1 (online). The broken line represents the Bonferroni cutoff for multiple testing (P=0.00028). The significant SNP cluster from SNP #17 to #41 spans from MYB to C6orf217. The black box indicates the SNPs which reside within AHI1 and the grey box indicates the SNPs residing in C6orf217. (b) Haplotype analysis across the 13.9 Mb genomic region on chromosome 6q using 3-SNP sliding windows. The haplotype # corresponds to the numbering in Supplementary Table S2 (online). The broken red line represents the Bonferroni cutoff for multiple testing (P=0.000082). The black box indicates the haplotypes that reside within AHI1; the grey box indicates the SNPs residing in C6orf217. The most significant haplotype cluster around haplotype #53 is located within the AHI1 gene. With only two common haplotypes in this region the most significant association is observed with the frequent and over-transmitted AAT haplotype (frequency: 0.67) consisting of SNPs rs6912933, rs9321501 and rs2746429 (P=0.00000000017) and a similar, but less significant trend, with the complementary under-transmitted GCC haplotype (frequency: 0.22, P=0.0000019).

Figure 3
figure 3

Genomic location and orientation of AHI1 and C6orf217. (a) The purple and green arrows indicate AHI1 and C6orf217 orientation, respectively. Gene structure of AHI1 and C6orf217 are depicted in purple and green vertical lines respectively, indicating exon positions. Yellow circles and their connecting arrows point to the location of significant SNPs (after Bonferroni correction). Grey lines indicate the positions of all SNPs genotyped in this genomic region. (b) Haploview representation of SNPs and their LD structure. Grey lines indicate the positions of the SNPs. Blocks were defined using the confidence interval method (integrated in Haploview). Haplotype tagging SNPs are surrounded by rectangles.

C6orf217 is primate-specific gene consisting of 10 exons and it has several alternatively splice isoforms. The predicted protein length depends on the splice isoform with a maximum of 135 amino acids with no similarity to any other known protein.39 Its largest open reading frame resides across exons one to three, while all the other exons seem to belong to the 3′ untranslated region (UTR). C6orf217 is expressed in brain, eye, kidney, testis, tongue, pancreas and lung during development as well as in the adult.

C6orf217 and AHI1 transcripts are genomically situated head to head, with only 55 bp distance between their 5′ ends, in all probability putting constraints on their mutual transcription. The two genes share two potential transcription factor binding sites (TFBS) for CREB (−30 bp for AHI1 and −12 bp for C6orf217) and ATF at −36 C6orf217 and at −20 for AHI1). Moreover, potential TFBSs for AHI1 gene are located within C6orf217, for example, RFX1 (−64 bp), STAT1 (−272 bp), NF-AT (−385 bp), AP1 (−682 bp) and AML-1a (−992 bp), as well as TFs of C6orf217 that are located at the AHI1 gene, for example, T3R (−89), Nrf-1 (−139), SRY (−163) and E2F (−195).

Deviation from Hardy–Weinberg equilibrium was observed in affected offspring only, with six out of a total of nine deviating SNPs residing in the AHI1–C6orf217 interval (Supplementary Figure S1 (online)). This puts additional emphasis on this genomic region, since the observation is distinct from the observed allelic association and reflects enrichment for specific genotypes in the patients that might indicate the loss of a protective allele, increasing the risk of C6orf217. There is no indication that the increased level of consanguinity in this population affects the Hardy–Weinberg equilibrium across this genomic region, since the parents exhibit a normal distribution of genotypes throughout the entire interval. Furthermore, the calculations (using PBAT software) correct for linkage and multiple offspring within one family, which otherwise would inflate the results.

To increase the information provided by single SNP analysis, we performed three-SNP ‘sliding window’ haplotype analyses using all 180 SNPs. A total of 612 individual haplotypes were analyzed (on average three haplotypes per sliding window) putting the Bonferroni cutoff P-value for significant haplotype association at 0.000082. Outstanding haplotype association signals were observed within the AHI1 gene (of the order of P=0.00000000017) with haplotypes encompassing rs9321501, followed by strong association of the C6orf217 locus (Figure 2b, Supplementary Table S2 (online)). Haplotype association exceeded the association of single SNPs in this region in degree of significance implying a convergence towards the putative disease locus. Two individual haplotypes in other regions remained significant after correction for multiple testing. One is in an intergenic region at 138.1 Mb and the other almost 1 Mb away at 139.0 Mb with contributing SNPs residing in two different genes namely C6orf63 and REPS1. For neither of these loci is there cumulative evidence from single SNP and haplotype analysis.

We genotyped eight out of 11 SNPs that survived stringent Bonferroni correction, in a second sample consisting of 209 outbred nuclear families of Arab origin, each with at least one offspring affected with schizophrenia (BT sample). Three SNPs (rs7739635, rs9494332, rs642162) were omitted for assay design reasons. None of the eight individual SNPs showed significant association with schizophrenia (Supplementary Table S3 (online)). However, sliding window analysis revealed haplotypes consisting of SNPs located within AHI1 and C6orf217 that were strongly associated with schizophrenia in the BT sample (threshold after Bonferroni correction for 25 haplotypes tested, P=0.002) (Figure 4, Supplementary Table S4 (online)). The same SNPs compose the associated haplotypes in the TKT and BT samples but no shared haplotype was observed. Only the C allele of SNP rs9321501 located in AHI1 is shared by both samples. The frequencies of common individual haplotypes in the two samples are comparable, although more rare haplotypes were observed in the larger and more outbred BT sample (Supplementary Table S4 (online)).

Figure 4
figure 4

Associated under-transmitted, but not common haplotypes around the AHI1 gene on chromosome 6q23 in the index (TKT) and replication (BT) samples. Major and minor alleles are represented by grey and white boxes, respectively. The associated nucleotides with their allele frequency for the given haplotypes are shown together inside the boxes. The numbers of observed versus expected transmissions of the haplotypes are referred to in columns S and E(S), respectively.

Discussion

We analyzed 53 families from a sample that shows linkage to schizophrenia on chromosome 6q23, by genotyping 180 SNPs in a genomic region of 13.9 Mb. A 0.5 Mb region of high LD in the vicinity of the original linkage peak at 136.3 Mb was significantly associated with the illness. The results of single SNP and haplotype association analyses focused on the Abelson Helper Integration Site 1 (AHI1) gene and an adjacent, primate-specific gene of unknown function, C6orf217.

The study of a complex disorder such as schizophrenia is facilitated by the homogeneous nature and possible founder effect in our well-characterized family sample.20 The Arab-Israeli population is an ethnically homogeneous group that has a high birthrate, an unusually high level of consanguinity (25% first cousin marriages) and a low rate of intermarriage with other population groups.40, 41 Our primary, TKT sample was derived from three towns that were founded 200–250 years ago by a limited number of families.42 In subsequent years there was immigration into the towns but the major population increase has been due to a high birthrate and low infant mortality in the past 75 years. Therefore, we assume a founder effect in this population, whereby most of the affected individuals will share the same chromosomal region identical by decent in the vicinity of the disease locus.

To validate our findings we examined a second family sample of Arab origin. This sample (BT) was outbred and showed no prior linkage to chromosome 6q23. Limited replication of the association in this sample provides additional support for the involvement of AHI1 in the genetic etiology of schizophrenia but cannot be regarded as confirmatory. It is noteworthy that the BT sample consisted of sporadic rather than familial cases and that the phenotype in the replication sample was more narrowly defined and encompassed only patients with schizophrenia according to DSM-IV. The fact that no shared haplotype was identified in two Arab samples for which a common founder mutation is postulated, could be due to several reasons. First, the SNPs defining the ‘at-risk-haplotype’ may not have been identified and genotyped yet; the SNP density in the replication sample does not allow any firm conclusion on this matter. Second, genetic and allelic heterogeneity within this ethnically homogeneous population cannot be excluded; such situations have been encountered for several monogenic traits43, 44 and might also apply to complex multifactorial traits. Third, an ancestral recombination event between the SNPs, rs9321501 and rs11154801, might have led the causative SNP to be found on complementary haplotypes in the two cohorts. As allele frequencies are comparable in the two ethnically matched samples, the existence of a common haplotypes is regarded as likely. Considering the under-transmitted haplotype, apart from the first SNP in the haplotype, rs9321501, complementary alleles are involved in the two cohorts hampering identification of such an ‘at-risk-haplotype’.

Human AHI1 is a very attractive biological candidate for a schizophrenia susceptibility gene. First, the gene is highly expressed in human fetal brain tissue and in the cerebellum and cerebral cortex of adult human brain.45 Second, comparative genetic analysis suggests accelerated evolution of AHI1 in the human lineage, particularly the N-terminal region of the gene; this was interpreted by Ferland et al45 to be a consequence of positive selection. Third, AHI1 contains several motifs that have been shown to be present in signaling molecules and in molecules mediating protein–protein interactions. Based on the study of murine ahi1 it it has been suggested that AHI1 may be a docking site or scaffold protein recruiting other signaling molecules and modulating and integrating their action.46

Recently, null-type and missense mutations in human AHI1 have been shown to cause the autosomal recessive brain disorder, Joubert Syndrome (JS).45, 47 The characteristic features of JS include agenesis of the cerebellar vermis, ataxia, hypotonia, oculomotor apraxia and mental retardation. Additional phenotypic manifestations are diverse and may include cerebral polymicrogyria and involvement of systems other than the CNS.47 It is of interest that AHI1 is the only JS locus that is associated with cortical abnormalities.47 Less pronounced structural mutations in AHI1 or variants that influence expression levels could affect brain development in utero or postnatally in more subtle ways, thus contributing to susceptibility to schizophrenia. Brain imaging studies of schizophrenia, including patients in their first episode of psychosis, have consistently identified increased volume of the lateral ventricles, slightly decreased overall brain size, and decreases of gray matter, white matter and regional volumes in hippocampus, thalamus and frontal lobes.2 Early post-mortem findings report neuronal disarray and abnormal migration of subplate neurons in the neocortical white matter suggesting prenatal neurodevelopmental abnormalities. More recent findings have emphasized abnormalities in neuronal size, arborization, and synaptic organization.11 A gene such as AHI1 which is intimately involved in brain development, could be implicated in several of the brain abnormalities observed in schizophrena.

The observed association of SNPs and haplotypes with schizophrenia extends from AHI1 into C6orf217. The association is more prominent in the TKT sample than in the BT replication sample. As C6orf217 encodes a small protein with no known motif it may well function as a regulatory gene either at its mRNA or protein level. We may speculate that due to the close poximity of the two genes AHI1 expression level may be depentent on C6orf217 gene expression.

Until now no causative, coding variants have been reported for any schizophrenia susceptibility gene. This has led to the assumption that changes in gene function may be due to variation in gene regulation rather than changes associated with the protein per se. A more detailed study of AHI1 and C6orf217 and their flanking regions is required to further elucidate their regulation and to facilitate the search for regulatory variants in addition to further fine mapping and ‘classical’ screening of coding regions.