Background
Atrial fibrillation (AF) is the most common cardiac arrhythmia and is a leading cause of stroke, heart failure, and dementia [
1]. AF currently affects over 30 million individuals worldwide [
2], and this number is projected to grow dramatically over the next 20 years [
3]. Despite > 100 years of basic and clinical research, the fundamental mechanisms of AF remain poorly understood.
Microarray expression analysis of atrial tissues can provide a global unbiased framework to characterize the transcriptional changes associated with AF. Advancement of high-throughput microarray technology is producing a large number of gene expression data, which are powerful tools for discovering and studying novel biomarkers for AF. Nonetheless, analysis based on high throughput data may face the dreaded ‘curse of dimensionality’. This refers to the phenomenon that the amount of sample size is relatively small while the number of features increases greatly, which will increase the probability of making statistical errors [
4].
Recently, integrated transcriptomic and quantitative proteomic analyses have been widely used to promote a better understanding of the molecular mechanisms driving biological processes in cells and tissues [
5]. Advances in mass-spectrometry (MS) provide an unprecedented opportunity for antibody-independent proteome profiling with approximately 80% of all proteins in major human tissues quantifiable by this technique [
6]. By integrating the transcriptomic and proteomic data, the ‘curse of dimensionality’ can be solved through cross-validation in the two levels. Besides, combining datasets from different origins by meta-analysis to extend the sample size and using some machine learning algorithms to select and reduce features could also help solve the ‘curse’ [
7].
Due to the difficulty in obtaining atrial tissue from healthy populations, the majority of atrial transcriptomic and proteomic studies of AF used atrial tissue from patients undergoing open-heart surgery with or without AF [
8,
9]. By controlling other variables such as the comorbidity, severity of mitral valve disease, age, and sex, analyzing differentially expressed genes (DEGs) or differentially expressed proteins (DEPs) could also help explain the associations between genes expression and this complex disease phenotype. Another commonly applied method is to use samples that are more available in healthy people such as peripheral blood. However, the expression profiles from different cells and tissues could be quite different due to cell/tissue-specific epigenetic regulation mechanism [
10]. Hence, we propose to identify feature genes from local atrial tissue as it can directly depict the altered gene expression profiles of atria, and so able to identify the atrial remodeling process of AF.
Here, our objective was to elucidate a more complete understanding of molecular mechanisms underlying AF and to find potential diagnostic and therapeutic targets. The integration of multi-omics data, along with the application of the machine learning approach, vouched for the identification of key pathways and feature genes in AF, which may help to investigate the underlying mechanism of AF and to discover potential diagnostic and therapeutic targets.
Discussion
To our knowledge, this is the first integrated transcriptomic and proteomic analysis of human AF atrial tissue, and the first to identify feature genes of AF using machine learning approach. Previous transcriptomic studies have provided insights into the pathogenesis of AF [
21,
22]. However, these experiments are generally analyzed through a single data source or restricted to a fewer sample which can lead to biological and technical biases. Thus, the microarray meta-analysis was used in this study to integrate four microarray data sets of AF from GEO which led to the identification of 863 DEGs. To elucidate a more complete understanding of AF pathogenesis, we also conducted a proteomic study of local atrial tissue which identified 482 DEPs.
Pathway enrichment analysis can help to characterize physiological and functional changes associated with the changes in mRNA and protein expression in AF atrial tissues. For the upregulated mRNAs or proteins, the top 19 scoring items were enriched in both transcriptomic and proteomic levels, which vouched for the importance and significance of these pathways. Some of the items, such as ‘PDGFRB PATHWAY’, ‘activation of immune response’, ‘muscle structure development’, ‘regulation of actin cytoskeleton’, and ‘leukocyte degranulation’, have been proved to play key roles in AF progression [
3,
23]. For the downregulated mRNAs or proteins, the top 19 scoring items were only enriched in the proteomic level, and these pathways were mainly involved in metabolism regulation, such as ‘mitochondrial respiratory chain complex assembly’, ‘TP53 regulates metabolic genes’, and ‘response to oxidative stress’. Besides, the ‘Metabolism of lipids’ pathway was enriched in two levels. These are in accord with the recent studies which highlighted the role of metabolic remodeling in AF [
24‐
26]. The reason why these pathways are only identified in the protein level may be caused by some post-transcriptional and translational regulations.
After cross-validation between the two omics data. We identified 30 genes or proteins with the same trends between two levels. To make the selected features more significant and informative, the machine learning CFS feature selection method was adopted in the training set which led to the final 10 features, wherein 8 are upregulated (CD44, CHGB, FHL2, GGT5, IGFBP2, NRAP, SEPTIN6, YWHAQ) and 2 are downregulated (TNNI1, TRDN). The NB classifier base on the expression values of these features in the training set can classify AF and SR samples with a precision of 87.5% and AUC of 0.995 in the independent test set.
Some of these feature genes have been reported to be associated with AF or its related pathogenesis. The CD44 related pathways including CD44/STAT3 and CD44/NOX4 signaling pathways can lead to atrial fibrosis [
27] and Ca
2+-handling abnormalities [
28] during AF. Secretogranin-1 (CHGB) presents in the secretory granules in atrial myoendocrine cells and is co-localized with atrial natriuretic peptide (ANP) while CHGB genetic variation results in oxidative stress [
29] and hypertension [
30]. The four and a half LIM domains protein 2 (FHL2) is a component of the hypertrophic response and is found to be protective in cardiac hypertrophic through inhibiting MAPK/ERK signaling [
31]. MAPK has been proved to function in AF context by mediating oxidative stress [
32,
33], epicardial adipose tissue remodeling [
34], atrial fibrosis [
35], load-induced hypertrophic response [
36], and ionic channel remodeling [
37]. Gamma-glutamyltransferase-5 (GGT5) is confirmed to be closely associated with immune cell activation [
38] and oxidative stress [
39,
40] and can be a potential biomarker of myocardial infarction [
41]. Insulin-like growth factor-binding protein 2 (IGFBP2) belongs to the insulin-like growth factor-binding protein (IGFBP) family. Two recent studies observed a higher hazard of incident AF associated with higher mean levels of plasma IGFBP1 protein [
42] and IGFBP3 protein [
43]. Nebulin related anchoring protein (NRAP) is present in myofibril precursors during myofibrillogenesis and thought to be involved in myofibril assembly [
44], and its genetic variance is associated with cariomyopathy [
45]. Septin-6 (SEPTIN6) is invovled in extracellular matrix remodeling [
46]. 14–3-3 protein theta (YWHAQ) is a gene in the P53 network and has been shown to promote apoptosis directly upon genotoxic stress [
47]. Another proteomic also identified YWHAQ as an important biomarker in AF [
47]. TNNI1 encodes a troponin-I protein that is the dominant form of troponin-I expressed in the fetal/neonatal/infant heart, and its participants in AF remains unknown. Triadin (TRDN) is a stable subunit of the ryanodine receptor 2 (RyR2) and is involved in the regulation of Ca
2+ release [
48]. The loss or dysfunction of RyR2 stable subunits was demonstrated to cause the occurrence of spontaneous calcium elevation in AF atrial cells [
49]. Our present study further proved and emphasized the importance of these markers.
There are some limitations to the current study. Firstly, the number of samples included in the microarray meta-analysis remains relatively small (n = 130), which is caused by the limited number of available studies in the GEO database. Secondly, there is no corresponding clinical information of the samples, we were not able to make a prognostic analysis of these biomarkers. Third, the samples used in the transcriptomic and proteomic studies came from patients with valvular heart disease. This is due to the difficulty in acquiring atrial samples from healthy cohorts. The psychophysiology of AF in patients with valvular heart disease may have some differences from those with non-valvular AF. We recommend further study to identify gene expression profiles using atrial samples from non-valvular AF patients and healthy donors. Finally, the transcriptomic and proteomic can only indicate the potential causes for a phenotypic response, but they cannot predict what will happen at the next level. Thus, one should consider the metabolomic that provides a functional view of an organism as determined by the sum of its genes, RNA, proteins, and environmental factors [
50]. Nonetheless, the integrated analysis of multi-omics data along with the machine learning method makes sure the selected genes as important features for AF. Further studies are needed to clarify their functions in AF pathogenesis.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.