Introduction
Mucopolysaccharidosis (MPS) IIIA, also known as Sanfilippo syndrome type A, is a neurodegenerative lysosomal storage disorder caused by a deficiency in the enzyme N-sulfoglucosamine sulfohydrolase (SGSH, EC:3.10.1.1), which is involved in the degradation of heparan sulfate. There are four different subtypes of MPS type III (type A – OMIM #252900, type B – OMIM #252920, type C – OMIM #252930, and type D – OMIM #252940) based on the enzyme deficiencies of SGSH, NAGLU, HGSNAT, and GNS, respectively. Each of the MPS III types is inherited in an autosomal recessive pattern with variations in the severity of phenotypes (Neufeld and Muenzer
1995). The genes encoding these four different enzymes have been characterized, and several mutations associated with these genes have been reported. The signs and symptoms of all four types are similar. Degeneration of the central nervous system, which results in mental retardation and hyperactivity, is the primary characteristic of MPS III, which commences in childhood (Fedele
2015). Other symptoms that are associated with the MPS III include delayed speech, behavioral problems, progressive dementia, macrocephaly, inguinal hernia, seizures, movement disorders, hearing loss, and sleep disturbances (Buhrman et al.
2014). The initial symptoms of the disease generally appear in the first to the sixth year of life, and death usually occurs in the early twenties (Valstar et al.
2010). The incidences of these subtypes are unevenly distributed. The estimated combined frequency of all four types varies between 0.28 and 4.1 per 100,000 live births. The incidence of MPS IIIA ranges from 0.68 per 100,000 to 1.21 per 100,000 in European countries (Baehner et al.
2005; Héron et al.
2011). MPS IIIA and MPS IIIB are more common than MPS IIIC and MPS IIID (Valstar et al.
2008), whereas MPS IIIA is more severe than MPS IIIB (Buhrman et al.
2013).
The gene encoding sulfamidase (
SGSH), which was identified in 1995, is localized on chromosome 17q25.3. The 502 aminoacid sulfamidase protein contains five potential N-glycosylation sites (Scott et al.
1995). It spans 11 kb and contains eight exons (Karageorgos et al.
1996). Until now, 115 mutations, including missense/nonsense, deletions, insertions, and splicing, have been recorded for the SGSH protein according to the HGMD database (
http://www.hgmd.cf.ac.uk/ac/all.php).
Proteins play a vital role in the regulation of various cellular functions, depending on their proper conformation in the cellular environment (Dill and MacCallum
2012). DNA variants known as single nucleotide polymorphisms (SNPs) have been known to introduce changes in the function of a gene (Cargill et al.
1999). A distinct class of such SNPs, known as nonsynonymous single nucleotide polymorphisms (nsSNPs), present in coding regions lead to amino acid changes that may cause alterations in protein function and account for vulnerability to disease. SNPs that do not affect the function of the protein are known as tolerated SNPs. Therefore, it is essential to distinguish the deleterious nsSNPs from the tolerant nsSNPs to understand the molecular genetic basis of human disease as well as to assess and understand the pathogenesis of the disease (Wang et al.
2009). Alterations and misfolding in protein structures due to nsSNPs lead to severe impairments that cause various diseases in humans (Chandrasekaran and Rajasekaran
2016; Thirumal Kumar et al.
2018a; Thirumal Kumar et al.
2018b; Valastyan and Lindquist
2014). Although most genetic variations in protein sequences are predicted to have very little or no effect on the function of the protein, some nsSNPs are known to be associated with the disease. These disease-related nsSNPs have adverse effects on the catalytic activity, stability, and interactions of the protein with other molecules. Thus, the identification of disease-associated nsSNPs is essential, and it will facilitate the elucidation of molecular mechanisms underlying a given disease (Sneha et al.
2017a; Zaki et al.
2017a). In subsequent years, the field of computational biology has emerged with advancements in automated methods to analyze the biological impact of nsSNPs based on the available information from modeled protein structures or structures derived from phylogenetic studies and comparative genomics (Chasman and Adams
2001; Sunyaev et al.
1999; Ng and Henikoff
2001). The experimental approach would be highly time-consuming to analyze the likely impact on protein function due to non-synonymous SNPs as well as to understand the association between these nsSNPs and the disease (Zhernakova et al.
2009). Information about the protein sequence and structure as well as the biochemical severity of the amino acid substitution, which are bioinformatics-based approaches, facilitates understanding of the phenotypic prediction. In recent years, various computational approaches have been developed that predict the effect of nsSNPs using various machine learning algorithms, such as the Hidden Markov model (Shihab et al.
2013), naïve Bayes classifier (Adzhubei et al.
2010), support vector machines (Acharya and Nagarajaram
2012; Capriotti et al.
2008), and neural network (Bromberg and Rost
2007), etc. In the present study, we performed an in silico analysis using various computational algorithms to explore the possible relationships between genetic mutations and phenotypic variations similar to our previous reports (Agrahari et al.
2018a; Agrahari et al.
2018b; Mosaeilhy et al.
2017a; Mosaeilhy et al.
2017b; Zaki et al.
2017b). To increase in prediction accuracy of disease causing variants, we used Meta-SNP server (Capriotti et al.
2013) that integrates four existing methods: PANTHER, SIFT, PhD-SNP, and SNAP to predict a mutation either disease (affecting the protein function) or neutral (having no impact). Further, a combination of these in silico tools and molecular dynamics studies in mutational analysis has been confirmed to be a dominant approach in understanding macromolecule behaviors and their microscopic interactions, allowing insights into the impact of mutations (Agrahari et al.
2019; Ali et al.
2017a; Ali et al.
2017b; Nagarajan et al.
2015; Sneha et al.
2018a; Sneha and George Priya Doss
2016; Sneha et al.
2018b; Thirumal Kumar et al.
2019). Molecular dynamics (MD) aid in understanding the significant changes in the macromolecular structures of proteins due to mutations at an atomic level. Various studies have been performed that show the influence of MDS in analyzing the effects of nsSNPs on protein structure (George Priya Doss and NagaSundaram
2012; Nagasundaram and George Priya Doss
2013; Thirumal Kumar et al.
2018a; Thirumal Kumar et al.
2018b; Xu et al.
2018; George Priya Doss and Zayed
2017; Mosaeilhy et al.
2017a,
b; Sneha et al.
2017b; John et al.
2013).
Based on experimental studies (Esposito et al.
2000; Héron et al.
2011; Knottnerus et al.
2017; Muschol et al.
2004; Perkins et al.
1999; Sidhu et al.
2014; Trofimova et al.
2014; Weber et al.
1997), the missense mutations R74C, S66W, and R245H were subjected to prediction tools. The goal of this study was to understand the impact of these deleterious nsSNPs at the structural level. The models of the mutant proteins were generated based on the crystal structure of the SGSH protein. The native and mutant proteins were then subjected to MD simulation analysis using GROMACS to observe the structural changes. Therefore, the present study demonstrates the potential of using computational methods in resolving the effect of deleterious nsSNPs on protein structure.
Discussion
Prediction of the phenotypic consequences of nsSNPs using
in silico algorithms might provide a significant understanding of the genetic differences in susceptibility to disease and response to drugs. Understanding the molecular basis of the disease at a structural level by experimental methods requires a large amount of effort and time. Since these methods have their limitations, there is a niche for
in silico methods, which can analyze functional SNPs with greater accuracy and speed (Adzhubei et al.
2010; Calabrese et al.
2009; PS et al.
2017b). The combination of various structure and sequence-based prediction methods, which use multiple algorithms, serves as a powerful tool and provides accurate and reliable predictions in identifying mutants as deleterious or neutral. Various pathogenic prediction tools, such as PANTHER, SIFT, SNAP, PhD-SNP, and Meta-SNP and stability prediction tools, such as I-Mutant 3.0, MUpro, and SDM, were used in our study to identify the deleterious nature of the variants (Table
1). Despite variations in the input and output of these methods and limitations in making predictions, the ultimate result is the differentiation of deleterious SNPs from neutral ones. The assimilation of these techniques together increases their overall power of prediction. However, supportive evidence is necessary for validation of these prediction methods. Based on experimental studies (Esposito et al.
2000; Héron et al.
2011; Knottnerus et al.
2017; Muschol et al.
2004; Perkins et al.
1999; Sidhu et al.
2014; Trofimova et al.
2014; Weber et al.
1997), we selected three mutants R74C, S66W, and R245H for our prediction analysis. As predicted by the multiple sulfatase sequence alignment, R74 is the analogous residue in the SGSH protein. The residual activity levels of the mutant protein were found to be reduced to less than 1% of wild type SGSH protein (Yogalingam and Hopwood
2001). The replacement of a basic positively charged arginine residue with a non-polar cysteine residue would disturb the ionic interaction of the native protein. The mutant residue is smaller and more hydrophobic. This difference in size and hydrophobicity between the native and mutant protein would remove a stabilizing hydrogen bond, which is vital for hydrolysis of the sulfate ester present at the non-reducing end of the substrate. Thus, this mutation is likely to abolish the enzyme function, thus reflecting its deleterious nature. The reduced specific activity and increased susceptibility to degradation may be due to the destabilization of the active site (Perkins et al.
1999). The evolutionary stability studies and mutational resistance of protein-coding genes have demonstrated that arginine, leucine, and serine are the primary amino acids affecting protein stability in the mutants (Prosdocimi Francisco
2007). Arginine is a hydrophilic amino acid and located in the exposed region, as shown in Fig.
1. Reports suggest that proteins have evolved to place arginine residues at their surfaces to help stabilize their structures (Strub et al.
2004). Arginine is considered the most favored amino acid due to its capacity to interact in different conformations, its side chain length, and its ability to produce good hydrogen-bonding geometries (Luscombe and Thornton
2002). Thus, the substitution of arginine with cysteine could cause adverse effects on the protein conformation and significantly change the structure and function of the active site of the SGSH protein.
The amino acid residue S66 is not conserved between SGSH protein and other sulfatases. It lies near the CSPSR motif and is therefore in the coordination sphere for the cysteine residue, which is post-translationally modified in the active site of eukaryotic sulfatases (Hopwood and Ballabio
2001; Schmidt et al.
1995). Reports have shown a rapid degradation and reduced activity of the S66W mutagenized form of SGSH. The substitution of the small polar serine with the non-polar bulkier tryptophan might distort the active site, resulting in lower specific enzyme activity and stability of the protein (Weber et al.
1997). Based on sequence comparison with arylsulfatase B following superimposition, amino acid residue R245 has been hypothesized to lie near the surface of the protein on α-helix 7 of arylsulfatase B, away from the coordination sphere forming the active site. The R245H mutation will, therefore, possibly affect the stability of sulfamidase without changing the specific activity of the protein. The size difference between the native and mutant residue may alter the hydrogen bond as the native did and destabilize the local structure and packing. The difference in charge will disturb the ionic interactions of the native protein, causing a loss of interactions with other molecules and in turn leading to a possible loss of external interactions (Perkins et al.
1999; Perkins et al.
2001).
To validate the accuracy of our prediction tools, the mutants were then subjected to studies of the behavior of the protein.
In silico analysis techniques in our study, including stability changes, pathogenic effects, and evolutionary conservation analysis, predicted that these three mutations (R74C, S66W, and R245H) had stability and functional impacts on the protein. The evolutionary analysis derives some essential features from predicting the impact of nsSNPs. The role of functional SNPs within the evolutionarily conserved regions has been validated in various studies. Deleterious mutations are more likely to correlate to protein sequences that are evolutionarily conserved due to their functional importance (Aly et al.
2006; Doniger et al.
2008; Tavtigian et al.
2008). Consequently, in our study, arginine at positions 74 and 245 and serine at position 66 were predicted to be highly conserved, functional, and exposed residues with a score of 9 based on the conservation scale of the Consurf server, illustrating the deleterious nature of mutations creating an impact on protein function (Fig.
1). The number of salt bridges formed was also compared between the native and mutant structures. Since salt bridges are dynamic and mostly exposed to the surface, they experience large thermal fluctuations and continuously break and reform. The formation of salt bridges governs the flexibility of the protein, and these salt bridge interactions are considered an essential factor in the stability of the protein (Jelesarov and Karshikoff
2009). We observed 33 salt bridges in native and mutant S66W, whereas mutants R74C and R245H had 30 and 32 salt bridges, respectively. The reduction in salt bridge formation in the mutants thus indicates the deleterious impact on protein structure and function.
Serine is a hydrophilic amino acid with hydrogen binding potential. It actively participates in hydrogen bond formation. The decrease in hydrogen bonds in mutant S66W could have been due to its substitution with a hydrophobic amino acid, tryptophan, with different physicochemical properties. Polar amino acids are commonly located in exposed regions of the protein, and any mutation in this region interferes with the functionality of the protein (Sudhakar et al.
2016). As S66 is present in the exposed region (Fig.
1), its contribution to solvent accessibility was reduced due to its substitution with tryptophan. Mutant S66W showed less solvent accessibility than the other two mutants, R74C and R245H, thus losing its contact with the surrounding solvent, as evidenced in the SASA analysis. Similarly, in the case of mutations R74C and R245H, arginine is a hydrophilic amino acid and is located in the exposed region of the protein (Fig.
1). Reports suggest that the replacement of hydrophobic residues with arginine at protein surfaces stabilizes the protein (Strub et al.
2004). Arginine interacts with the solvent and increases stability. Thus, the substitution of arginine with a hydrophobic amino acid cysteine might decrease stability and lead to a destabilization of the protein, consistent with the results obtained in the RMSD, hydrogen bond, and Rg analyses. Arginine, which has a positive charge, is larger than cysteine with a neutral charge. This difference in size and charge between the native and mutant residue might disrupt interactions with metal CA, as observed in the surrounding amino acids where the interaction with CA was lost. The difference in charge would also alter ionic interactions of the native protein, as validated by salt bridge analysis where three salt bridges were lost (Table
3). In mutant R245H, histidine is smaller than arginine. There was a decrease in the number of hydrogen bonds formed in all the mutants, as evidenced in the hydrogen bond analysis of the MDS (Fig.
4). The stability difference caused by the mutations was further studied by analyzing the changes in secondary structural elements between the native and mutant proteins using the PDBsum database. The mutational positions R74, S66, and R245 in SGSH protein were initially located. Position S66 contributed to the formation of beta turns, whereas R74 and R245 were present in the alpha-helical region of the protein (Fig.
7). The mutational positions in the secondary structure of the proteins play an essential role in identifying structural alterations in the protein (Mosaeilhy et al.
2017a; Mosaeilhy et al.
2017b; Sneha et al.
2018a; Sneha et al.
2018b; Thirumal Kumar et al.
2016; Yagawa et al.
2010; Zaki et al.
2017a). Alpha helices and beta strands are stabilized by hydrogen bonds (Schneider and Kelly
1995). Mutations that occur in alpha helix regions and beta sheets of the protein create a deleterious impact on the protein (Sneha et al.
2017a;
2017b; Mosaeilhy et al.
2017a; Mosaeilhy et al.
2017b), whereas mutations in turns or loops have minimal effects on the structural integrity of the protein (Yagawa et al.
2010). Thus, these mutations in alpha helices and beta turns could affect hydrogen bond formation and exert a deleterious impact on the protein, as validated by the hydrogen bond analysis.
Stability is a fundamental criterion that strengthens the biomolecular functions, regulation, and activity of the protein (Chen and Shen
2009). Deleterious nsSNPs can alter the normal function of a protein by changing the geometric constraints and hydrophobicity and disrupting hydrogen bonds and salt bridges (Rose and Wolfenden
1993; Shirley et al.
1992). To understand the stability and dynamic behavioral changes at an atomistic level, MDS analysis was carried out to study the behavior of the native protein and mutants R74C, S66W, R245H. Different parameters, such as RMSD, RMSF, hydrogen bond numbers, the radius of gyration, and SASA, were calculated from the simulation trajectory. Molecular stability and flexibility changes were observed based on the RMSD and RMSF analyses, respectively. The results of the SGSH protein stability analysis indicated that all the mutants (R74C, S66W, and R245H) exhibited different RMSD values when compared to the native protein. Higher deviations were observed in all mutants in comparison to the native protein. A high or reduced deviation indicates a decrease or increase in the stability of the molecule (Yun and Guy
2011). Since higher deviations led to an increase in protein rigidity, the stability analysis revealed that the mutant structures resulted in increased rigidity of the protein due to the substitution of deleterious amino acids, which was also correlated with the reduced number of hydrogen bonds in all mutants (Fig.
2a and b). Mutant S66W showed the greatest fluctuations followed by mutants R74C and R245H, thus increasing the rigidity of the protein. Thus, consistent with the RMSD analysis, the flexibility changes observed by RMSF revealed that the native protein had minimum fluctuations. As hydrogen bonds are responsible for stabilizing the structure of the protein, the determination of hydrogen bonds provides a robust and reliable indicator of the stability of the protein (Gerlt et al.
1997). Thus, the mutants showed a loss of stability by the formation of fewer hydrogen bonds than the native structure, which showed the largest number of hydrogen bonds. The reduction in a number of hydrogen bonds in the mutant structures might be due to the loss of surrounding amino acids. In the case of S66W, serine, a polar amino acid, participates in hydrogen bond formation. Serine substitution with tryptophan results in fewer hydrogen bonds, thus leading to reduced stability of the protein. Furthermore, the compactness of the protein was studied using Rg. The graph shows that the native protein had superior compactness to the mutant proteins, as evidenced by the RMSD stability analysis. The loss of surrounding amino acids in mutants could have been a reason for this loss of compactness. The SASA values were also calculated for the native and mutant structures. The observed changes in SASA values indicated the occurrence of amino acid residue repositioning from buried to accessible or accessible to buried regions. S66W had reduced solvent accessibility than mutants R74C and R245H, which indicated a potentially reduced chance of their interaction with other molecules. Thus, the SASA analysis suggested how the incorporation of deleterious amino acids introduced changes in hydrophilic and hydrophobic regions of the protein. Furthermore, based on our PCA analysis, the mutants had greater flexibility than the native protein. Greater motional changes make a protein less stable. The PCA results indicated the least stability in all mutant structures compared with the native protein, which is consistent with the results of the RMSD and hydrogen bond analyses. These motional changes indicate a loss of stability of the mutant proteins, including changes in their physicochemical properties. Therefore, the present results correlate well with experimental studies of the severity of the disease. The overall results indicated a loss of stability and functionality of the protein due to the deleterious impact of the amino acid substitutions, which might adversely affect the enzymatic activity of the protein to lead to neurodegeneration.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.