Background
Schizophrenia (SCZ) is a devastating mental disorder that affects about 0.5–1% of the world’s population [
1]. The main symptoms of SCZ include positive symptoms (hallucinations and delusions), negative symptoms (anhedonia, alogia, and avolition), and cognitive impairments (impaired working memory and executive function) [
2]. Due to the high mortality and considerable morbidity [
3], SCZ imposes substantial economic burden on society and becomes a major threat to global health [
4]. The pathophysiology of SCZ remains largely unknown. Nevertheless, lines of evidence indicate that SCZ has a strong genetic component. The heritability of SCZ was estimated around 0.80 [
5], implying the major role of the inherited variants in SCZ. To dissect the genetic basis of SCZ, great efforts have been made and significant progresses have been achieved. Low-frequency variants such as structural variants [
6], copy number variations [
7‐
10], rare [
11], and de novo mutations [
12‐
16] were reported to be associated with SCZ. In addition, GWASs have identified over 200 risk loci that showed robust associations with SCZ [
17‐
27].
Although recent large-scale studies have provided important insights into the genetic etiology of SCZ, challenges remain in dissecting the genetic architecture of SCZ. First, the majority of risk loci were identified in populations of European ancestry [
21,
26]. Considering the diverse differences of allelic frequency and linkage disequilibrium pattern in different continental populations [
28], performing GWAS in non-European populations will provide new insights into genetic etiology of SCZ. Second, despite the fact that a recent GWAS meta-analysis in populations of East Asian ancestry (EAS) revealed comparative genetic architecture of SCZ between populations of European ancestry (EUR) and East Asian ancestry (EAS) (genetic correlation between EAS and EUR is 0.98 ± 0.03), this study also showed population-specific associations [
24]. For example, Lam et al. found that a large proportion of genome-wide significant variants identified in EAS showed dramatic differences in allelic frequency between EAS and EUR [
24], further indicating the importance of conducting GWAS in non-European populations. Third, accumulating data suggest that a large proportion of risk variants contribute to SCZ through modulating gene expression [
29‐
31]. Therefore, it is important to pinpoint the potential target genes of the identified risk variants. To address these challenges, we firstly conducted a GWAS in Han Chinese population (
N = 8202). We then performed a large-scale meta-analysis (a total of 143,438 subjects) through combining our results with summary statistics from previous GWASs conducted in EAS and EUR (i.e., summary statistics-based meta-analysis, fixed-effect model was used) [
24]. We also performed a transcriptome-wide association study (TWAS) to pinpoint the potential target genes of the identified risk variants and explored the potential tissue and cell type that the identified risk variants and genes may exert their biological effects.
Discussion
In this study, we first performed a GWAS for SCZ in Han Chinese samples. We then conducted meta-analyses by combining our results and the published GWAS summary statistics from individuals of East Asian and European ancestry [
24]. We identified 2 new genome-wide significant risk loci in our Han Chinese cohort. SNP rs7192086 was located in the intron 2 of the
SHISA9 gene. SHISA9 (also known as CKAMP44) protein is a brain-specific type-I transmembrane protein and is highly expressed in hippocampal dentate gyrus and brain cerebral cortex [
61]. SHISA9 is enriched at postsynaptic sites and its intracellular domain contains a PDZ domain interaction site which could physically interact with AMPA-type glutamate receptor (AMPAR); thus, SHISA9 plays important roles in synaptic short-term plasticity [
61]. Notably, the AMPAR shows abnormal forward trafficking in the frontal cortex of SCZ patients [
62]. These lines of evidence suggest that
SHISA9 may contribute to SCZ by affecting the function of AMPAR and synaptic transmission. However, further functional studies are warranted to reveal the role of
SHISA9 in SCZ.
Another genome-wide significant risk variant identified in our sample is rs57016637. Intriguingly, we noticed that rs57016637 is fixed in other populations (Figure S
8). Despite the fact that the majority of risk variants have similar effects between EUR and EAS populations [
23], population heterogeneity still exists. For example, rs374528934 was reported to be strongly associated with SCZ in EAS (
P = 5 × 10
−11). Nevertheless, the MAF of rs374528934 in EUR is quite low (0.7%) [
24]. Our data suggest that rs57016637 may be a Han Chinese-specific risk variant for SCZ. SNP rs117961127 (in LD with lead SNP rs57016637 in the loci, r
2 = 0.32 in 1000 East Asian samples) was located in the intron 2 of
OSBP2, a gene that encodes a cholesterol-binding protein. Cholesterol levels were reported to be altered in SCZ cases compared to controls [
63]. In addition, Krakowski et al. showed that cholesterol levels were strongly associated with cognition in SCZ [
64]. These data suggest that
OSBP2 may have a role in SCZ through regulating cholesterol levels. Further investigating the role of
OSBP2 in SCZ is needed.
Two novel GWAS loci reported in our analysis did not reach GWS level in our follow-up meta-analysis with EAS and EAS + EUR samples (Additional file
1: Table S1). Of note, previous studies have also observed similar results in GWAS studies of Han Chinese [
19,
60]. For example, the top associations identified by Shi et al. (rs16887244, rs10489202) [
19] and Yue et al. (rs1233710, rs1635, rs2142731, rs11038167, rs11038172, rs835784) [
60] in Chinese population did not reach genome-wide significance level in a larger meta-analysis (in EAS) reported by Lam et al
. [
24]. More work is needed to explore if this observation is due to population-specific associations or genetic heterogeneity between regional samples.
By meta-analyzing our results with GWAS associations from EAS and EUR [
24], we identified 15 new risk loci, including 7p15.3 (the lead risk SNP is rs2106747, which was strongly associated with the expression level of
FAM211A) and 12q13.12 (the lead risk SNP is rs7301566, which was an eQTL of several genes, including
COX14, DIP2B, CERS5, RP4-605O3.4, SPATS2, ASIC1, and
ATF1)
. These new risk loci provide valuable clues for further functional study. Further functional investigation of these risk genes will also provide important insights into SCZ pathogenesis and help to develop potential therapeutic targets. Some of our newly identified GWS loci are located in genomic regions near previous reported loci. For example, rs319227 (
P = 6.11 × 10
−09, Table
1) and rs11958187 (
P = 9.39 × 10
−09, reported by Lam et al. [
24]) are located near
PPP2R2B, but these two index SNPs are not in LD (R
2 < 0.1). In addition, rs2106747 (
P = 3.36 × 10
−08, Table
1) and rs112316332 (
P = 3.04 × 10
−08, reported by Lam et al. [
24]) also showed similar results. These results suggest that these risk loci are genetically independent. However, more work is warranted to elucidate the functional mechanisms of these loci.
We compared our results with the findings reported by PGC3 (preprinted on medRxiv) [
65]. We found that 6 loci (index SNPs are rs115487049, rs6848123, rs319227, rs59761926, rs7301566, rs6563592) reported in our study (Table
1) also show significant associations with SCZ in PGC3. This observation suggested though our sample size is relatively small, it could help us to discover new associations. A meta-analysis with PGC3 will help to identify more new associations and the underlying genetic basis of SCZ
Despite the fact that SCZ risk associations were highly shared between EAS and EUR, conducting GWAS in EAS is still important as it could improve our understanding of the underlying biology of SCZ. Firstly, Lam et al. showed that the genetic correlation between the EAS and EUR GWAS summary statistics is 0.98, indicating that the genetic basis of SCZ are highly shared between EAS and EUR. With the increasing EAS sample, novel risk loci well be identified continuously, which will help us to understand the genetic basis of SCZ better. In addition, Lam et al. also reported EAS-specific association, e.g., rs374528934 (
P = 5 × 10
− 11, minor allele frequency is 0.45 and 0.007 in EAS and EUR, respectively). These results demonstrated that the EAS GWAS summary statistics can not only facilitate to discover the genetic associations shared between EAS and EUR, but also help to identify EAS-specific GWAS associations. Finally, genome-wide associations from different populations help to improve fine-mapping [
24]. We believe that EAS GWAS summary statistics will provide important insights into the genetic architecture (both shared with EUR population and EAS-specific) and the underlying biology of SCZ.
In the meta-analysis of EAS samples (our sample + EAS) [
24], we found some novel loci (compared with EAS summary statistics alone). However, these loci also reached genome-wide significance level in EAS + PGC2 EUR summary statistics [
24]. For example, rs12031518 reached genome-wide significance level in our EAS meta-analysis (our sample + EAS) (
P = 4.96 × 10
−08). Of note, this SNP did not reach genome-wide significance level in EAS samples (
P = 7.58 × 10
−08). However, it showed genome-wide significant association in EAS + PGC2 EUR summary statistics (
P = 6.43 × 10
−11) [
24]. We calculated the genetic correlation between our ASA/GSA samples and the reported EAS (22,778 cases; 35,362 controls) [
24]. Although these two GWAS summary statistics are highly correlated (the genetic correlation is 0.71), the genetic correlation is not very close to 1. A possible reason is that the sample size included in our study is relatively small (3493 cases and 4709 controls) compared with the reported EAS samples (22,778 case and 35,362 controls).
PRS analysis revealed several interesting results. First, EAS training set had overall better performance than EUR and CLOZUK+PGC2 samples (though the sample size of EAS is less than the two GWAS summary statistics), indicating that similar ethnic background (of the EAS summary statistics) helps to improve the PRS prediction performance. Second, CLOZUK+PGC2 training set had better performance than EUR, indicating that the training set with larger sample size had better performance. Third, EAS + EUR GWAS summary statistics had the best performance than other training sets. This result reflects that trans-ancestry meta-analysis improves the prediction power.
Tissue and cell-type enrichment analysis revealed that SCZ associations showed the significant enrichment in the cerebellum, suggesting the potential role of cerebellum in SCZ. Of note, several previous studies also suggested that cerebellum may play an important role in SCZ [
66‐
69]. These results suggest that the cerebellum may have a pivotal role in SCZ etiology.
Our study has several limitations. Firstly, our sample size is relatively small compared with recent SCZ GWAS cohort, such as PGC2 [
26], Clozuk [
21], or East Asian meta-analysis [
24]. Additional SCZ risk loci will be found with the increase of sample size. Secondly, although we reported 17 novel risk loci, the casual variants and genes of these identified risk loci remain largely unknown. Further work, including pinpointing causal variants and genes, functional characterization of risk genes, exploring the role of risk genes in developing and adult brain, will provide pivotal insights into SCZ pathophysiology. Thirdly, we used eQTL data from Europeans to explore the associations between genome-wide significant SNPs and gene expression level in human brain. Considering that some novel risk loci were from our Chinese cohort, an ideal approach is to check the effect of the novel genetic variations and gene expression both in EAS and EUR populations. However, the brain eQTL data in EAS is not publically available so far. More work is needed to explore if these genetic variations also associated with gene expression in Chinese population. Fourthly, although including PCs as covariate is a regular and useful way to correct population stratification of GWAS, challenges remain in PCA. For example, selecting the optimal number of PCs [
70] remains an open question (i.e., is relatively arbitrary, different numbers of PCs were reported in different studies). In addition, more work is needed to determine the number of optimal genetic markers for PC calculation.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.