Introduction

Wilson disease (WD/WND, NCBI MIM No. 277900) is an autosomal recessive inherited disease characterized by excess accumulation of intracellular hepatic copper with subsequent hepatic and neurologic abnormalities. It was first reported by Dr Samuel Alexander Kinnier Wilson in 1912 and since then a growing number of cases were identified worldwide.1 It is estimated that the disease frequency is about 1/5000 to 1/30 000 and the carrier frequency is even higher.2 WD patients have a very poor prognosis if not diagnosed at an early stage and treated appropriately. A life-span treatment is required for WD patients with severe symptom or early onset age, and an early diagnosis and therapy will remarkably improve life quality with a hope of complete symptomatic recovery.3, 4, 5 Therefore, quick and accurate genetic testing for WD patients and efficient screening for carriers in the population are urgently required.

WD is one of the most prevalent inherited diseases caused by abnormalities of a single gene, the ATP7B, which encodes a copper-transporting ATPase 2 β protein.3, 6 It is located at 13q14.3, spanning a genomic region of ∼78 kb and including 21 exons and 20 introns. ATP7B and its homolog ATP7A are the two main copper-transporting proteins.7 Defects of ATP7A impair the entry of copper from enterocytes into circulation and thereby cause deficiency of copper, known as the Menkes disease, an X-linked disorder.8 The role of ATP7B is quite different and it mainly functions as transforming apoceruloplasmin into ceruloplasmin and excreting copper into the biliary canaliculi. Defects of ATP7B reduce the blood ceruloplasmin and are harmful to hepatocytes. Many other organs, such as the brain and kidney, can also be affected.9

Up to now, more than 500 forms of genetic mutations in ATP7B have been reported and there are 1343 known single-nucleotide polymorphisms (SNPs), suggesting that ATP7B is susceptible to mutation.10 In this study, we identified 36 mutations from 114 individuals in north China Han populations who were diagnosed as WD, of which 14 mutations have never been reported previously and 5 are firstly described in the Chinese. We also performed a bioinformatic analysis to predict the functional effects of these mutations, and predicted a group of amino-acid residues of the ATP7B protein to be mutation hot spots.

Materials and methods

Subjects

A total of 114 individuals, who belonged to the Han population from north China and were diagnosed as WD, were recruited in this study, of which 76 were males and 38 were females, with onset ages from 7 to 44 years and a median age of 25. One hundred and eleven unrelated individuals and three siblings were set as the control, and all of them were fully informed and signed for agreement. All the patients were diagnosed as WD at the Peking Union Medical College Hospital generally based on the following criteria: (1) liver or brain failure symptoms; (2) presence of K-F ring in the cornea by slit-lamp examination; (3) reduced serum ceruloplasmin (<0.20 g l−1) and/or elevated 24-hour urinary copper excretion (>1.0 μmol per day) and/or hepatic copper content >250 μg per g of dry weight.11

Data source

Reference ATP7B mRNA (NM_000053.3) and protein sequences (NP_000044.2) were retrieved from the NCBI Entrez database, and SNPs were retrieved from the NCBI database (db) SNP (http://www.ncbi.nlm.nih.gov/snp/), including all SNPs in the ATP7B gene region except the clinical submission items.12

Multiple approaches were applied to obtain the known genetic mutations of ATP7B, including (1) the Wilson Disease Mutation Database in the University of Alberta, maintained by the Cox Lab (http://www.wilsondisease.med.ualberta.ca/database.asp);10 (2) the UniProt (Universal Protein Resource) accession of ATP7B protein P35670 (http://www.uniprot.org/uniprot/P35670); (3) the Human Gene Mutation Database (HGMD) Professional at the Institute of Medical Genetics in Cardiff, a product of BIOBASE (http://www.hgmd.org/).13 HGMD is a commercial knowledge base service, being established and maintained manually, and the last update was at 30 September 2011. This database is thought to be the most precise and complete collection of the ATP7B gene variants. All the missense/nonsense mutations with an accession in the above databases were merged into a new set (Supplementary Table 1) and other forms of mutations, such as insert, deletion, indels, synonymous and noncoding mutations, were mainly based on the HGMD database annotation.

DNA extraction, PCR and sequencing

Genomic DNAs were extracted from the peripheral blood leukocytes of the patients by QIAamp blood kits (Qiagen, Hilden, Germany) according to the manufacturer’s instruction. Eleven exons (2, 5, 8–12, 13, 16, 18 and 19) of the ATP7B gene and their flanking intron sequences, which are a total of 5687 bp in length, were amplified by PCR. These exons were selected because most ATP7B mutations in the Chinese WD patients are found to be located in them.14, 15, 16, 17, 18 Primer sets were designed by MutScreener or retrieved from published papers, and the sequences and the product length are described in Supplementary Table 1. Briefly, PCR was performed in a 25-μl reaction volume containing 20–100 ng genomic DNAs and 10 μM of each primer. Initial denaturation was at 95 °C for 5 min, followed by amplification for 30 cycles with denaturation at 95 °C for 30 s, annealing at 60–65 °C for 30 s and extension at 72 °C for 45 s. PCR products were subsequently purified using PCR purification kits (Guangzhou Dongsheng Biotech Ltd, Guangzhou, China) and sequenced bidirectionally using a dye-terminator cycle sequencing ready reaction kits (ABI PRISM 3130, Applied Biosystems, Life Technologies, Foster City, CA, USA).

Sequencing results were aligned to the ATP7B reference sequence (NM_000053.3) to find out mutations. To reduce false-positive discovery, (1) high-fidelity Taq enzyme was used for amplification and the reaction condition for PCR was optimized; (2) each sample was sequenced at least twice in order to validate the result.

Identification of novel mutations

A genetic variant is considered to be a novel mutation when it fits all of the following conditions: (1) it is not recorded in the dbSNP database as a polymorphism, (2) it is not included in the merged data set established by us and (3) it has not been reported in PubMed literatures.

Allele frequency and heterozygosity for each mutation were calculated by perl-script written by us, in order to find prevalent mutations and hotspot. Meanwhile, allele frequency and heterozygosity of SNPs identified in this study were also analyzed and subsequently compared with those obtained from the dbSNP database.

A self-writing program was used for genotyping and paired relevance analysis of the identified mutations.

Computational prediction of deleterious variants of ATP7B

To predict the deleterious effects of a mutation to the structure and function of the protein, we applied three popular programs: PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/index.shtml), PMut (http://mmb2.pcb.ub.es:8080/PMut/) and SNAP (http://rostlab.org/services/snap/).19, 20, 21, 22 To predict deleterious SNPs located in the introns, upstream and downstream of the coding region, the FastSNP program was used (http://fastsnp.ibms.sinica.edu.tw/pages/input_CandidateGeneSearch.jsp).17 All the programs were run by their default parameters with optimization.

Predication of mutation hot spots for ATP7B

ATP7B protein sequences of 13 non-human organisms, including monkey (XP_001103242.2), rat (NP_036643.2), mouse (NP_031537.2), dog (NP_001020438.1), cattle (XP_002691840.1), sheep (NP_001009732.1), horse (XP_001488500.1), duck (XP_001378265.2), chicken (XP_417073.3), panda (XP_002917723.1), platypus (XP_001513328.2), lizard (XP_003215344.1) and zebrafish (XP_684415.4), were retrieved from the NCBI database. Multiple sequences alignment of these sequences together with the human ATP7B protein sequence was performed by the Clustal X software (http://www.clustal.org/) with default parameters.18 Conservative scores of each residue were calculated and outputted, ranging from 0 to 100. An amino-acid residue with the conservative score >80 was considered to be a high-conservative site and more than five consecutive such sites were defined as a conservative segment.

To identify mutation hot spots for ATP7B, all ATP7B missense/nonsense mutations, including those identified in this study and those reported previously, were mapped into the conservative track. We defined a mutation hot spot when (1) there are more than one form of missense/nonsense mutations for a high-conservative site, or (2) there is only one form of mutation, but it is within a conservative segment.

Results

Identification of novel and/or prevalent ATP7B genetic variants of north China patients with WD

By PCR, sequencing and bioinformatic analysis, we examined 11 exons (exon 2, 5, 8, 9, 10, 11, 12, 13, 16, 18 and 19) of the ATP7B genes of 114 WD patients of the Chinese Han population living in north China. Totally, we identified 47 genetic variations, of which 36 were mutations and 11 were SNPs, which can be found in the dbSNP database.

Among the 36 mutations, there were 26 missense mutations, one nonsense mutation, three synonymous mutations, four indels and two nucleotide changes in the flanking introns that may disturb splicing (Table 1). Exon 8 was most frequent to suffer a mutation, followed by exon 13 and exon 18. These exons seem to be more susceptible to mutation in WD patients of the Chinese Han population living in north China, consistent with previous studies performed in patients living in other Chinese districts.23, 24 Fourteen mutations were found to be located at the transmembrane segments-encoding region and four at the copper binding domain, and especially the mutation Mut-02 was exactly located at the copper binding residue. Some other mutations were located at the ATP-binding or catalyzing region. The mutant-allele frequencies were calculated; p.R778L and its linked p.L770L (Mut-13 and Mut-12) were the top prevalent mutations in this study and accounted for 38.6% cases, which is consistent with the previous studies in Asians including the Chinese.24 p.A874V (Mut-17) was the third prevalent mutation and p.P992L (Mut-28) the fourth. Others were rare mutations with no more than three cases.

Table 1 Information of ATP7B mutations

We applied an approach to determine whether a mutation was novel or not (see the Materials and methods). We found that 14 mutations identified in this study have never been reported previously, and 5 mutations were firstly identified in the Chinese (Table 1, and Supplementary Figure 1 for electropherogram of the mutations). A user database containing all the missense/nonsense mutations reported previously and those identified in this study are listed in Supplementary Table 2.

The minor allele frequency and heterozygosity of the SNPs identified in this study were compared with those in the NCBI dbSNP annotations (Supplementary Table 3). We found that most of the SNPs identified in this study had lower minor allele frequency and heterozygosity in individuals recruited in this study than in the whole population. This may be due to the differences of allele frequency among ethnic populations.

Genotyping the mutations of each WD patient and identification of the linked mutations

Genotypes of each WD patient are summarized in Supplementary Table 4. On average, one patient harbored 0 to 3 mutations in the genomic region we examined. In detail, 24 patients harbored three mutations, and in these patients, p.L770L, p.R778L and p.P992L were the most prevalent genotype. In all, 35 patients harbored 2 mutations, 21 harbored 1 mutation and 34 harbored no mutations. Homozygous mutations were only found in six patients, and five of them were p.L770L and p.R778L homozygote and one was p.C980L homozygote.

In order to investigate the potential linkage between mutations, the paired emerging of the mutations with an allele count of no less than two was analyzed. We found that two pairs of mutations, p.L770L/p.R778L and p.A874V/p.I929V, were closely related (Figure 1). A full map showing the paired relevance of all the mutations found in this study is shown in Supplementary Figure 1.

Figure 1
figure 1

Paired relevance of mutations. Grids in gray: patient count of specified linked mutations; grids in blue: patient count of a single prevalent mutation; numbers in red: triple linked mutations. Notice that all the observed p.L770L mutations are linked with p.R778L and p.I929V are linked with p.A874V.

Predicting the functional effects of mutations and SNPs

We applied three popular programs to predict the functional effects of these mutations and SNPs. PolyPhen-2 predicts the possible impact of an amino-acid substitution on the structure and function of a human protein by using straightforward physical and comparative considerations. SNAP is a neural-network-based method that uses in silico-derived protein information (for example, . secondary structure, conservation, solvent accessibility, etc.) in order to make predictions regarding the functionality of mutated proteins. It returns a score for each substitution. These scores can then be translated into binary predictions of effect (present/absent) and reliability indices . PMut is based on the use of different kinds of sequence information to label mutations, and neural networks to process this information to estimate whether a mutation is pathological (that is, it can lead to disease for the carrier) or non-pathological/neutral (no effect on the carrier’s health). The cross-validated performance of PMut is 84% overall success rate and 67% improvement over random. The combined prediction results are shown in Table 2. Notice that the nonsense mutation that truncated the protein sequence significantly was commonly considered to be most deleterious, as well as the one identified in our study, p.Q388X.

Table 2 Prediction of the functional effects of the mutations

Meanwhile, a comprehensive analysis to predict the functional effects of the SNPs were run by the FastSNP program, and 28 SNPs were predicted to be functional or deleterious (Supplementary Table 5).

Predication of mutation hot spots for ATP7B

Multiple sequences alignment of ATP7B protein sequences of human and other 13 non-human species were performed, and conservation of each amino-acid residue was quantitatively scored from 0 to 100 (Figure 2a and Supplementary Table 6). All ATP7B missense/nonsense mutations, including those reported previously and those identified in this study, were mapped into this conservative track to show the mutant forms counts (Figure 2b).

Figure 2
figure 2

Tracks presenting the conservation and mutation hot spots for ATP7B. (a) Conservative score of ATP7B full length calculated by mutiple sequences alignments among 14 species (ranged from 0 to 100). (b) All currently known missense/nonsense mutations of ATP7B accompanied with our findings are mapped into the protein sequence. Vertical coordinates indicate the count of variant forms of one AA residue site. (c) Merged tracks indicate the mutation hot spots for ATP7B in WD patients.

Totally, we identified 34 mutation hot spots, which are schematically shown in Figure 2c and listed in Supplementary Table 7. Mutations of these hot spots should also be more deleterious to protein structure and functions than those of other sites, and therefore should be preferentially selected for genetic testing of WD. The reported mutation hot spot p.R778L of East Asians and p.H1069Q were both included in this group, verifying our strategy.25

Discussion

WD is known to be caused by abnormalities of the copper-transporting protein encoding gene ATP7B. Nonsense mutations of ATP7B can result in truncated proteins with severe defects of copper transporting, and finally lead to serious syndrome and an early onset age.26 Missense mutations influence protein structure and can be found in several functional domains, including those functioning as an interaction between ATP7B and its chaperone proteins.27 Splicing and transcription regulation can also be disturbed by the mutations located in the introns and the promoter region.28, 29 All these kinds of mutations are identified in this study. It is interesting that some synonymous mutations are also linked to WD, such as c.2310C>G, p.L770L, which is almost completely linked with the most prevalent mutation p.R778L identified in this study and reported in WD patients of the Hong Kong Chinese.24 The p.L770L mutation was rare in the normal population but appeared frequently in WD patients, being found in more than one-third of the patients we examined. However, this silent mutation seems to be a polymorphism in European population. A similar mutation is p.I1148T, which is found to be a polymorphism in our study, but is only found in WD patients in France.30 Further expression analysis will be helpful to elucidate the functions of these mutations.

A growing number of ATP7B mutations have been identified in WD patients and some mutation hot spots were identified among races and areas. For example, p.R778L was the most prevalent mutation identified in East Asians with WD, with a frequency of up to 60%.31, 32, 33 In this study, we found that p.R778L (21.5%) and p.P992L (6.1%) were the most frequent mutations, which is consistent with several previous reports in the Chinese WD patients.14, 15, 16, 17, 18 The p.A874V (7.5%) and p.N1270S (1.8%) were also among the most frequent mutations in this study, which was reported in two groups of patients from Korea33 and Shanghai, China,14 suggesting similar ATP7B mutation spectrum among these districts and ours.14, 34 Also consistent with the previous reports, we did not found the H1069Q mutation in this study, which is a common WD mutation in Europeans and North Americans,35, 36, 37 and has not been reported in Chinese WD patients.14, 15, 16, 17, 18 Only six homozygous mutations were found in our study, five of which were p.L770L and p.R778L homozygote, which are considered to be less deleterious but will lead to early onset of the disease. The other homozygote was p.C980L, which was a novel mutation identified in this study. It was predicted to be deleterious by all the three programs and it will be interesting to investigate the functional effects of this mutation in the future. It should be noted that in this study, the rate of patients with no mutations is relatively high (34/114, 29.8%), and this may be due to the fact that we only sequenced 11 out of 21 exons of the ATP7B gene. More mutations are expected to be found when more exons, as well as the regulatory regions, are sequenced in the future.

A group of 28 frequently observed mutations were suggested to be used for rapid diagnosis of WD.38 In this study, we applied a different strategy to find out a group of 34 mutation hot spots. It should be noted that two groups overlap by 5 sites and 29 sites in our group are not included in the previous panel. Thus, our predication should be a useful supplement for clinical genetic testing of WD. Recently, the ATP7B gene was found to be associated with sporadic Alzheimer's disease and some cancers, and thus the genetic testing of ATP7B should also benefit studies of these diseases.39, 40, 41, 42, 43