Introduction

Autoimmune diseases are common disorders, affecting around 5–10% of the population, with female subjects generally having a higher disease incidence than male subjects.1 Evidence for genetic contributions is strong, with increased concordance rates observed in monozygotic twins (ranging from 30–70%) when compared with dizygotic twins.2 A feature of these disorders is their tendency to co-occur in the same individuals and in families. For example, a cluster of rheumatoid arthritis, insulin-dependent diabetes mellitus and autoimmune thyroid disease has long been recognized.3 Further, shared susceptibility loci identified for different diseases have also been observed,4 supporting the hypothesis that clinically distinct disorders share a common genetic background controlled by pleiotropic susceptibility genes. In support of a shared genetic basis, genes in three regions have been consistently associated with multiple autoimmune diseases: the human leukocyte antigen (HLA) class II region on chromosome 6p21, the gene encoding cytotoxic T lymphocyte-associated 4 (CTLA4) on chromosome 2q33 and the PTPN22 gene encoding lymphoid tyrosine phosphatase on chromosome 1p13,5 although the role of these genes in some diseases remains unclear.

Genome-wide association studies for autoimmune diseases are identifying novel genes that were not previously considered as candidates, and genes with effects that would be difficult to detect in linkage studies.6, 7, 8, 9, 10, 11 Meanwhile, much effort has been expended in family collections and genotyping of linkage studies, and it is therefore valuable to extract all possible information from these resources. In this study, we present a meta-analysis of all genome-wide linkage screens performed on autoimmune diseases, using the genome scan meta-analysis (GSMA) method.12 GSMA is a widely used approach to combine linkage results from non-overlapping studies, and has previously been applied to many diseases, including specific autoimmune diseases.13, 14, 15, 16, 17, 18 In this study, we analysed 42 independent studies with complete genome-wide results in rheumatoid arthritis (RA), juvenile RA (JRA), systemic lupus erythematosus (SLE), ankylosing spondylitis (AS), inflammatory bowel disease (Crohn disease or ulcerative colitis, IBD), psoriasis (PS), autoimmune thyroid disease (Hashimoto thyroiditis or Graves disease, AITD), vitiligo (VIT), multiple sclerosis (MS), type 1 diabetes (T1D) and celiac disease (CD). We also investigated linkage to various clusters, such as diseases associated with PTPN22 or CTLA4 and diseases found to co-occur more frequently in the same families or patients.

Materials and methods

We identified all published genome-wide linkage studies performed in any autoimmune disease using Medline and PubMed searches. Genome-wide association studies were excluded, as were candidate region and follow-up linkage studies, which considered only specific regions of the genome. Linkage studies overlapping in patient samples, or extended versions of previous publications were identified, and only independent and most recent studies were included. Where studies had performed a two-stage analysis, genotyping markers in targeted regions in the second stage, only the first stage results were used, as the GSMA requires a uniform distribution of markers and families across the genome. Finally, linkage studies performed on single consanguineous families, usually from isolates or specific ethnic groups, were excluded.

In total, 56 independent genome-wide linkage studies were found. 14 studies were excluded as they presented linkage results only at the most significant regions, which reduces the power of the GSMA.19 The remaining 42 studies with complete genome-wide results were included in the meta-analysis, and comprise 7350 families with 18 291 affected individuals from 11 diseases (Table 1 and Supplementary Table A). Most studies (37 out of 42) used populations of European ancestry, and these were also analysed separately. Results from the X chromosome were available in 29 studies.

Table 1 Summary of the genome-wide linkage studies included in the GSMA for each disease

GSMA performs meta-analysis pooling results across the genome and does not require individual level genotype data. It is a rank-based meta-analysis method, which assesses the strongest evidence for linkage within prespecified genomic regions, termed bins, usually of 30 cM width. For each study, the maximum linkage statistic within a bin is identified (eg, maximum LOD or NPL score, or minimum P-value). Bins are then ranked, and ranks (or weighted ranks) for each bin are summed across studies, with the summed rank (SR) forming a test statistic. The significance of the SR in each bin is assessed using Monte Carlo simulations, permuting the bin location of ranks within each study, to obtain an empirical P-value. Analysis was performed using the GSMA software (http://www.kcl.ac.uk/mmg/research/gsma/) with 10 000 simulations.62

The traditional 30 cM bin definition gives a total of 118 bins on the autosomes and an additional six bins on the X chromosome, on the basis of the Marshfield genetic map (http://research.marshfieldclinic.org/genetics/home/index.asp).63 To assess how bin width and location affects the significance and localization of linkage signals, we also applied the GSMA using different bin definitions (20 and 40 cM wide, and shifted 30 cM bins obtained by moving bin boundaries by 15 cM).15 To control for multiple testing, we used a Bonferroni correction for the number of bins across the genome: for 30 cM bins, a P-value of 0.05/118=0.00042 was necessary for genome-wide significant evidence of linkage and a P-value of 1/118=0.00847 for suggestive evidence of linkage. Corresponding P-values were calculated for other bin sizes.

We performed both unweighted GSMA analysis (which assumes an equal contribution from each study) and weighted analyses. Three weighting functions were used (Supplementary Table A): firstly, controlling only for study size (wts1, with a weighting factor for each study equal to the square root of the number of affected individuals), secondly controlling partially for the number and size of studies in each disease (wts2, where the sum of the weights for each disease depended on the square root of total number of affected individuals of each disease) and thirdly allowing each disease to contribute equally to the analysis, regardless of the number of studies or their sample size (wts3, using the weighted summed ranks from for each disease). For weighting functions wts2 and wts3, within each disease, studies contributed in proportion to the square root of the total number of affected individuals. An X-chromosome GSMA was performed on the 29 studies for which these results were available. GSMA was performed genome-wide, and then X-chromosome P-values extracted.

We also examined different autoimmune-disease clusters on the basis of identified familial aggregation, defining clusters of T1D+AITD+RA (nine studies) and AITD+SLE+VIT (seven studies,) and on the basis of common pathophysiological characteristics, such as the seronegative spondyloarthropathies group (IBD+PS+AS, 18 studies). Finally, as genetic association provides direct evidence for common aetiological pathways, we considered two clusters of diseases, which show association with PTPN22 (T1D+AITD+RA+SLE+JRA+VIT, 13 studies) and with CTLA4 (T1D+ AITD+CD+SLE, 11 studies), although conflicting evidence has been observed in other autoimmune diseases.64

We estimated the power of linkage studies to identify PTPN22 and CTLA4 at suggestive and genome-wide significance levels, assuming non-parametric linkage analysis with affected sibling pairs.65, 66 For sample size, we used the total number of families in each cluster, as simulation shows that the GSMA has similar power to individual genotyping to detect linkage.67 We extracted genotypic relative risks of the PTPN22 and the CTLA4 risk variants from the literature, and then calculated the locus-specific recurrence risks.

Results

Genetic regions showing at least nominally significant evidence for linkage (P-value<0.05) in the weighted and unweighted meta-analyses of 30 cM bins are shown in Table 2, with weighted (wts1 and wts2) and unweighted results illustrated in Figure 1. No evidence for linkage was detected on chromosome X. As expected, the most significant results were obtained at the HLA region (bin 6_2), with a strong effect also seen in flanking bins (6_1, 6_3, and to some extent in bins 6_4 and 6_5). This linkage evidence in 6q may be a carry-over effect from the HLA region, as multipoint linkage scores can be elevated across 30–50 cM, but we cannot exclude the possibility that it indicates a novel susceptibility locus on chromosome 6q. Although there is considerable variation across diseases in associated alleles/haplotypes and risks conferred,69 strong linkage to HLA in this GSMA was observed, with maximum ranks obtained at bin 6_2 for all diseases, except VIT (showing a rank above the 90th percentile) and AITD (∼75th percentile).

Table 2 Bins showing nominally significant evidence for linkage for all diseases and the PTPN22- and CTLA4-associated disease clusters, using bins of 30 cM widths
Figure 1
figure 1

Evidence for linkage in meta-analysis of 42 genome-wide linkage studies by chromosome (X-axis). Y-axis shows –log10 (P-value) for unweighted and weighted (wts1 and wts2) summed ranks against bin location (30 cM bins), with a single point plotted for each bin, omitting HLA region showing strong linkage. Thresholds for nominal, suggestive and genome-wide significant evidence for linkage are shown.

On chromosome 16, suggestive evidence for linkage was consistently detected in bins 16_1 and 16_2 for weighting factors wts1, wts2 and for unweighted analysis, with nominal significance for wts3. Further analysis using different bin widths showed suggestive evidence for linkage (Figure 2). The strongest result was obtained with a 20 cM bin (18.1–38.5 cM and 10.0–19.8 Mb); the original 30 cM analysis splits this bin, with a consequent slight reduction in significance for bins 16_1 and 16_2.

Figure 2
figure 2

Linkage to chromosome 16 using different bin widths (20 cM, 30 cM and 40 cM) and shifted 30 cM bins for the weighted GSMA (wts2) on 42 studies. The Y-axis is –log10 (P-value) multiplied by the total number of bins, to give consistent thresholds for genome-wide and suggestive evidence for linkage for all bin widths.

On chromosome 3, bin 3_4 attained suggestive significance in the weighted analysis when controlling for the study size (wts1), but only nominal significance in other analyses. We also analysed the studies excluding chromosome 6, to ensure that the assignment of high study ranks to this region was not obscuring linkage elsewhere in the genome. No additional regions reached suggestive evidence for linkage, and only nine bins achieved nominally significant (a non-significant increase compared to 5.6 bins expected by chance). The analyses of only European ancestry studies showed similar evidence for linkage on chromosomes 6, 16 and 3, and failed to identify any novel linked regions.

The contribution of each disease to the chromosome 16-linked region in Figure 3 shows that 7 of the 11 diseases attained ranks above the 75th percentile of ranks across the genome, and three diseases (CD, JRA and VIT) had contributions below the 50th percentile, suggesting that this region may harbour a pleiotropic gene (or genes) conferring risk for several, but not all, autoimmune disorders.

Figure 3
figure 3

Contribution of each autoimmune disease to the most strongly linked 20 cM bin on 16p13.2-p12.3, with 50 and 75th percentile of ranks for the whole genome shown.

Two clusters showed suggestive evidence of linkage to bin 16_2: the seronegative spondyloarthropathies (IBD, PS and AS), with P-value=0.0046 for weighting factor wts1, and the PTPN22-associated cluster (Table 2). In the seronegative spondyloarthropathies, suggestive evidence for linkage was also detected on chromosome 3 (bins 3_4 and 3_8, P-value=0.0025 and 0.0010, with wts1) and in AITD-SLE-VIT cluster on the chromosome 18 (bin 18_1, P-value=0.0056, with wts1; full results not shown).

Focusing on bins containing genes associated with autoimmune diseases, no evidence for linkage was detected in bin 1_6 (which contains PTPN22) or bin 2_7 (CTLA4), in either the full analysis of all diseases, or in the focused analysis of only those associated with these genes (Table 2). Similarly, IL23R (bin 1_4) would not have been detected, despite its potential role in IBD, MS, PS, CD and AS. The relative risks conferred by variants in these genes are modest, but differ across diseases (Table 3), and our study might lack power to detect linkage. The power to detect linkage at PTPN22 and CTLA4 in a study of affected sibling pairs with the same number of affected individuals as in the disease-associated clusters is 12 and 3% for PTPN22, with 1 and <0.1% for CTLA4 for genome-wide and suggestive levels of significance, respectively.

Table 3 Genotype relative risks and frequency of PTPN22 and CTLA4 risk variants, showing inferred sibling relative risk accounted for by each locus, the sample size in number of families from the GSMA study in each gene-associated cluster and the power to detect linkage at suggestive and genome-wide levels of significance

Discussion

This meta-analysis of all genome-wide linkage screens performed on autoimmune diseases is the first attempt to combine evidence from linkage studies to identify loci which confer susceptibility to autoimmunity independent of disease phenotype. With 42 genome-wide linkage searches included, this forms the largest GSMA study performed to date. We detected highly significant evidence for linkage in the HLA region and flanking bins, confirming that the most potent genetic influence on susceptibility to autoimmunity is the major histocompatibility complex (MHC).

Outside this region, suggestive evidence of linkage was consistently observed only on chromosome 16p. Further, we detected no excess of regions achieving nominally significant evidence for linkage, implying that no loci exist with large sibling relative risk (eg, λS>1.15) acting across a substantial subset of the diseases. Other large GSMA studies, such as BMI and obesity (37 studies)75 and schizophrenia (32 studies; MY Ng, personal communication), showed a similar pattern of results with few strongly linked regions. The failure of our study to detect PTPN22 and CTLA4, and the fact that strong GWA findings in complex diseases are rarely located in regions highlighted in linkage studies, illustrates the different profiles of risk variants that can be identified in association studies and in linkage studies. Most disease alleles identified through association are common and confer modest increased risk to disease; in contrast, large linkage studies, such as this meta-analysis with over 7300 families, lack power to detect these common low-risk genes. There is growing evidence that multiple rare variants and structural variations have a function in the susceptibility of common diseases,76, 77 Some rare variants might be more easily detected in linkage studies, which can have higher power to detect rare, high penetrance variants or where allelic heterogeneity within genes exists.

The chromosome 16 region (16p13.2-p12.3) provided suggestive evidence for linkage across diseases and weighting functions, so may harbour a pleiotropic gene (or genes) conferring risk for several autoimmune diseases. This region has not been highlighted in individual linkage studies or previous meta-analyses of specific diseases, which may reflect the power of the GSMA method to identify novel linked chromosomal regions that were not detected in the contributing studies. Analysis of different bin widths delineated a minimal linkage region of ∼20 cM (18.1–38.5 cM; 10.0–19.8 Mb), which contains several candidate genes potentially involved in autoimmune diseases. In particular, TNFRSF17 (MIM 109545), the tumour necrosis factor receptor superfamily member 17, is a receptor preferentially expressed in mature B lymphocytes, and may be important for B-cell development and autoimmune response; CIITA class II (MIM 60005) is an MHC transactivator, encoding a protein located in the nucleus and acting as a positive regulator of class II MHC gene transcription; and the IGSF6 gene (MIM 60622), immunoglobulin superfamily member 6. A recently published genome-wide association study identified the KIAA0350 gene on chromosome 16p13.13 (10.9–11.2 Mb) as a potential locus for T1D.8, 9, 10 Power calculations on the basis of the identified associated variant show that this gene is unlikely to account for the linkage signal (power to detect suggestive linkage was <0.1%), suggesting either the presence of other risk variants not yet identified at KIAA0350, other susceptibility genes in the region, or a coincidental result.

Although the pathogenic mechanisms responsible for the initiation of autoimmunity remain poorly understood, studies clearly demonstrate that genetic predisposition is a major factor in autoimmune diseases susceptibility. It has been hypothesized that certain immunological pathways are common to multiple diseases, whereas other pathophysiological mechanisms are specific to a particular disease. Indeed, familial clustering of autoimmune diseases has been observed, and loci identified for a specific disease often overlap with loci implicated in other autoimmune diseases, suggesting the presence of pleiotropic genes. Many different approaches (including linkage, association and functional studies) will be needed to dissect the genetic contribution to autoimmune diseases.