Background
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus was identified in Wuhan, China, in late 2019 and has provoked an ongoing global pandemic of the resulting illness, COVID-19. COVID-19 has a broad spectrum of clinical manifestations, with most infected subjects showing only mild symptoms or being asymptomatic [
1,
2]. The leading group of patients with high mortality rates comprises those with severe respiratory failure associated with acute respiratory distress syndrome (ARDS) and interstitial pneumonia: these high-risk patients require early and prolonged support by mechanical ventilation to compensate for their respiratory failure [
3,
4]. The reasons for the heterogeneous clinical repertoire of COVID-19 are mainly unknown. Only three risk factors have been consistently related to life-threatening COVID-19-associated respiratory failure: male sex, old age and concomitant medical conditions, such as diabetes, obesity, hypertension and cardiovascular pathology [
5,
6]. However, despite all these components, there is significant inter-individual variability in each demographic and epidemiological group. Thus, given the immense impact of COVID-19 on morbidity and mortality, there is an unmet medical need to discover endogenous cellular and molecular biomarkers that predict the expected clinical course of the disease.
Based on this critical lack of knowledge regarding the molecular mechanism underlying COVID-19 infection response, recent epigenome-wide association studies (EWAS) explored a particular layer of biological information: the impact of epigenetic variation and in particular DNA methylation in establishing a severe clinical course of COVID-19 disease [
7‐
11]. These studies demonstrated that epigenetics plays a central role in the progression of COVID-19 through the identification of specific methylation signatures associated with the severe clinical evolution of the infection. However, there is only a partial overlap between their results: this dissimilarity is due to different choices in study design. For example, the main differences concern the number of samples analyzed, the inclusion criteria used to enroll patients, and the bioinformatics methods/strategies adopted to analyze data. Bernardes and colleagues [
8] performed a longitudinal multi-omics approach but focused on the DNA methylation profile of a limited number of patients. Also, Zhou et al. analyzed a small cohort of patients stratifying the methylation cohort into three groups: mild and severe COVID-19 patients and healthy subjects. Balnis and colleagues [
7] improved the number of patients and compared mild (non-ICU) and severe (ICU admitted) vs healthy subjects obtaining a signature of 77 differentially methylated positions associated with the degree of severity of COVID-19. Konigsberg et al. [
10] compared the methylation status of three groups of patients: SARS-CoV-2-Positive, SARS-CoV-2-Negative, and subjects with other respiratory infections. In Castro de Moura et al. [
9], only young patients without comorbidities were enrolled. These studies shed light on different epigenetic aspects underlying response to SARS-CoV-2 infection, however, mainly not targeting high-risk patients. There is still a lack of knowledge regarding the role of epigenetics in characterizing severe outcomes in this specific patient group.
Based on these considerations, we conducted a genome-wide study using the Illumina 850 K Beadchip on 190 blood samples from Italian COVID-19 patients who were at high risk for comorbidities and clinical factors. The goal of this study was to identify epigenetic biomarkers that could predict the clinical prognosis of these patients and provide insights into the role of epigenetic mechanisms in the evolution of COVID-19 severity.
Discussion
In this study, we investigated epigenetic differences that may play a role in the development of severe COVID-19 in a group of high-risk Italian patients (i.e., with high prevalence of comorbidities). We identified a group of 21 epigenetic markers that were able to predict the risk of severe outcomes, such as death or the need for mechanical ventilation in the intensive care unit. To confirm the validity of our findings, we also analyzed publicly available methylation datasets from other COVID-19 patient groups, including GSE167202 [
10] and GSE174818 [
7], which had a similar research design and enough clinical information to classify patients as mild or severe using our method.
The differential methylation analysis at the group level took into account cellular heterogeneity or confounders, highlighting 880 differentially methylated CpG sites equally constituted by hyper- and hypo-methylation. Functional annotation of the 880 CpG sites (Additional File
3) pointed out several potentially relevant genes involved in biological processes/pathways related to immune response and already implicated in COVID-19 response.
Among the hypermethylated genes, it is worth mentioning SAMHD1 (SAM And HD Domain Containing Deoxynucleoside Triphosphate Triphosphohydrolase 1), a gene that plays a role in the regulation of the innate immune response. The encoded protein is upregulated in asymptomatic subjects compared to severe COVID-19 cases in response to SARS-CoV-2 infection [
19]. In addition, this molecule is reported to be involved in the molecular mechanisms associated with neurological complications related to COVID-19 [
20]. Another interesting gene is SETD2 (SET domain containing 2, histone lysine methyltransferase), which enhances the expression of some Interferon-Stimulated Genes (ISGs) by depositing H3K36me3 on their promoters [
21,
22].
IRF2 (interferon regulatory factor 2) codes for a member of the interferon regulatory transcription factor (IRF) family, which has been identified as a potential candidate gene for SARS-CoV-2 gender susceptibility [
23]. IL12B (interleukin 12B) encodes a subunit of interleukin 12, a cytokine that acts on T and natural killer cells by enhancing their lytic activity and stimulating the production of IFN-gamma. It has been hypothesized that death from COVID-19 may be associated with immunogenetic markers including IL12B, along with HLA-B, IL6, and IL10 [
24]. TRIM8 (tripartite motif-containing 8) is suspected to be an E3 ubiquitin-protein ligase involved in multiple biological processes, including the innate immune (IFN-mediated) response [
25‐
27]. Other genes that have been annotated in public database KEGG for their role in the COVID-19 pathway include NLRP3 (NLR family pyrin domain containing 3), MAPK10 (mitogen-activated protein kinase 10), and PIK3CD (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit delta). Among the hypomethylated genes, several gene loci are worth mentioning: ARID5A (AT-rich interaction domain 5A), which codes for a nucleic acid binding protein involved in immune regulation and cellular homeostasis [
28]; CD226 (CD226 molecule), a glycoprotein expressed on the surface of immune cells that has been linked to tissue infiltration and organ dysfunction in severe COVID-19 cases [
29]; CD244, a transmembrane protein that acts as a cell surface receptor on immune cells and has been linked to decreased serum cytotoxic effector molecules in severe COVID-19 cases [
30]; IL1R1 (interleukin 1 receptor type 1), a cytokine receptor involved in cytokine-induced immune and inflammatory responses that has been linked to cytokine storm and the risk of venous thrombosis events among COVID-19 complications [
31,
32]; STAT6 (signal transducer and activator of transcription 6), a nuclear transcription factor that plays a role in IL4-mediated biological responses and has been found to be increased in the lungs of COVID-19 patients [
33].
Furthermore, the gene over representation analysis and the gene set enrichment analysis, conducted on the 100 best ranked differentially methylated genes also supported the functional annotation results uncovering an epigenetic impairment of biological processes related to the immune system regulation and interferon-response pathways (Additional File
4). The published genome-wide epigenetic studies on COVID disease [
7,
9,
10] showed similar and consistent results. The 880 CpG signature was unable to distinguish all severe COVID-19 patients from mild subjects after unsupervised clustering (Fig.
3) but a subgroup (G4), strongly enriched in severe patients, clearly emerged. Focusing on clinical data of severe patients enriching group 4 we did not observe any significant association with clinical variables able to explain such a clustering. A further analysis of this cluster identified 21 specific CpG sites that were able to distinguish between clinical outcomes, along with the first principal component (PC1) (Fig.
4). The scores of PC1 were then used to estimate the risk of developing a severe outcome, and the logistic regression analysis showed a significant association between PC1 scores and severe outcomes [OR = 2.55 (95% CI 1.9–3.5)]. The regression model took into account important covariates such as clinical factors and pharmacological treatments. Clinical factors included host characteristics (such as gender, age, and smoker status), comorbidities (such as obesity, hypertension, diabetes, cardiovascular disease, and pre-existing cancer), and biochemical parameters (such as elevated creatinine, reduced albumin, elevated aspartate aminotransferase, elevated LDH, elevated C-reactive protein, elevated d-dimer, elevated leukocyte count and, elevated LDL levels). Pharmacological treatments were divided into chronic treatments related to comorbidities and therapeutic treatments administered in the hospital. The regression model showed that neither pharmacological treatments nor clinical variables affected the association between the epi-signature and disease outcome. We further compared the performance of three models for predicting severe outcomes in patients considering: host factors, comorbidities, and biochemical parameters with the aim of verifying whether the epi-signature was able to improve the model based only on clinical information. In each case, we evaluated the models with and without the epi-signature by means of likelihood ratio testing and McFadden's pseudo-R squared. The results showed that the models that also included the epi-signature were always significantly more predictive, explaining a higher proportion of the variance in severe outcomes.
One explanation for this result may lie in the type of patients enrolled in the study. It is known that the presence of comorbidities plays a significant role in determining the risk of developing a severe form of the disease. However, not all subjects with comorbidities or risk factors experience adverse events. The cohort selected in this study mainly consists of fragile subjects with comorbidities, considered at high risk. In fact, the presence of risk factors was so widespread that the analysis of clinical data did not reveal any differences between the two groups examined, with the exception of diabetes. The results seem to indicate that epigenetics could be an additional tool able to improve the ability to discriminate and predict the severe outcome of the disease among fragile subjects. Further strengthening these findings is the observation of a higher grimage clock in patients with more severe prognosis: it has been demonstrated that the grimage clock, the epigenetic signature associated with important mortality risk factors, was better at predicting survival than the risk factors themselves.
Moreover, to evaluate the effectiveness of the 21 CpG epi-signature, we compared patients with severe outcomes and those with mild outcomes to a cohort of individuals, COVID-19 negative. The results confirmed that the epi-signature was able to distinguish between the group with severe COVID-19 outcomes and the COVID-negative group [OR = 2.3 (95% CI 1.78–3.17)]. However, it was not able to differentiate between the group with mild COVID-19 outcomes and the COVID-negative group. This result suggests that the epi-signature may be more specifically related to severe COVID-19 outcomes and not influenced by the presence of COVID-19 infection. Finally, the predictive capacity was confirmed by the efficiency in discriminating cases from controls in two external validation datasets generated from raw data downloaded from public repositories (GEO) which gave consistent and similar odds ratio values GSE167202: OR = 1.4 (95% CI 1.19–1.68) and GSE174818: OR = 1.69 (95% CI 1.34–2.23). Therefore we can hypothesize that the signature of 21 CpG sites may be a valid measure in predicting outcome.
The evidence of an epigenetic perturbation following COVID-19 infection also emerges from subsequent investigations conducted after the classical differential methylation analysis.
When evaluating epigenetic markers used to estimate the biological age [
12] we observed a significantly increased epigenetic age acceleration (DNAm GrimAge) in COVID-19 severe cases compared to mild cases. A significant DNAm GrimAge increase is unequivocally present even after evaluating this variable in the two independent GEO validation datasets (GSE167202 and GSE174818) confirming the robustness of the results. This increase occurs even if no appreciable differences in patients' chronological age between the two groups were observed. The hypothesis of a strong relationship between accelerated aging and COVID-19 also emerges from literature data: for example, in Corley et al. [
34], the authors correlated the severity of COVID-19 phenotype (with a higher risk of mortality) to a significant increase in DNAm age, proposing the epigenetic clock estimates (both Steve Horvath’s DNAmAge and Grimage) as the main predictor of the disease evolution. In Ying et al. [
35], the authors investigated the causal relationship between aging and COVID-19 by analyzing biological age-correlated measurements, suggesting accelerated aging as the cause of enhanced susceptibility to COVID-19 infection and to severe forms of the disease. These results support the hypothesis that epigenetic impairment plays a role in the evolution of the COVID-19 disease.
Another piece of evidence proving epigenetic involvement after COVID-19 infection emerges from the analysis of stochastic epigenetic mutations (SEMs) which represent a robust biomarker of a possible epigenetic drift and an effective indicator of the accumulation of DNA damage related to environmental exposure [
15]. In our study, severe COVID-19 patients show a higher burden of SEMs than their mild counterparts, especially hypo-methylations.
This analysis highlights another important aspect related to epigenetics in COVID-19 patients, and the result can be considered robust since we validated it in other cohorts.
We are aware of the limitations and strengths of our EWAS approach: concerning limitations, we limited the analysis to whole blood as representative of the methylation status of the disease but additional studies in alternative tissues may be necessary to confirm results. In addition, the presence of missing data amongst clinical information may have reduced statistical power in some analyses. To mitigate the effect of missing data and optimize statistical power, we adopted the pairwise deletion approach. Another limitation concerns the information on medications and treatments. This information was considered a potential confounder in the regression models and is reported in the Additional materials, however, no conclusions could be drawn about the effect of medications on survival because the study was not designed for this purpose. The reason is that we know that treatments were decided based on clinical conditions and therefore some associations may be biased.
Concerning strengths, the study focused on patients with high-risk clinical factors, enabling the evaluation of an additional layer of epigenetic involvement and improving the ability to explain the disease outcome. The study also added further information by investigating some innovative aspects such as the assessment of epigenetic drift. Finally, the results were successfully replicated in validation cohorts.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.