Background
Coronavirus (CoV) is a group of single-stranded RNA viruses and is a pathogen of the human respiratory system. CoV infection results in lethal respiratory diseases, including severe acute respiratory syndrome (SARS), middle east respiratory syndrome (MERS) and coronavirus disease-2019 (COVID-19). SARS induced by SARS-related coronavirus (SARS-CoV) affected 8096 patients from 2002 to 2003 with a fatality rate of 9.6% worldwide [
1]. MERS-related coronavirus (MERS-CoV) affected 2519 cases with a high fatality of 34.4% [
2]. As of 24th May 2020, a new strain of CoV, SARS-CoV-2 induced COVID-19 has leads to over 5.2 million cases in 188 countries, resulting in more than 337,000 deaths, and numbers substantially increase every day [
3]. COVID-19 has become a public health emergency of international concern and designated a pandemic by WHO [
3]. The lack of deep understanding of SARS-CoV-2 is hampering vaccine development.
The most severe sequela of pathogenic coronavirus infection-induced SARS is lung fibrosis that up to 45% of SARS patients develop lunf fibrosis after 3–6 months, and this potentially sets an important context for COVID-19 [
4‐
7]. Lung fibrosis is characterised by excessive deposition of extracellular matrix (ECM) proteins, such as fibronectin (Fn). This results in impaired lung function and reduced gas exchange. Transforming growth factor beta (TGF-β) associated signalling pathway play important roles in lung fibrosis, but the role of this pathway in COVID-19 is unclear. A recent study shows that COVID-19 patients have a high risk of lung fibrosis [
8]. Increasing studies show that COVID-19-induced acute respiratory distress syndrome (ARDS) results in diffused alveolar damages in lungs, and the cases of long-term ARDS leading to lung fibrosis are starting to be reported [
9‐
13]. However, the links between SARS-CoV-2 and lung fibrosis remains unclear.
SARS-CoV and SARS-CoV-2 share approximately 76% amino acid sequence homology that lead to the similarities in their biological properties [
14]. The spike (S) protein is a key structural component of CoV that binds to host cellular receptors that facilitates viral entry into target cells [
15]. Angiotensin-converting enzyme 2 (ACE2) has been identified as a receptor of SARS-CoV-2 [
4], which is cleaved by type II transmembrane serine protease (TMPRSS2) to augment virus entry into host cells [
15]. ACE2 is also cleaved by a disintegrin and metallopeptidase domain (ADAM)17 of the host, which facilitates shedding of ACE2 into the extracellular space to bind with CoV [
16]. However, it remains unclear how SARS-CoV-2 infection induces lung fibrosis.
In this study, we examined SARS-CoV-2 entry into target cells by binding with ACE2 after TMPRSS2 and ADAM17 cleavage. We found that human alveoli epithelial cells are the main target cells of SARS-CoV-2 rather than airway bronchial epithelial cells. SARS-CoV-2 infection alters gene expression, including tissue inhibitor of metalloproteinase (TIMP)3, angiotensinogen (AGT), TGFB1, connective tissue growth factor (CTGF), vascular endothelial growth factor (VEGF) A and FN1, and these changes are also observed in lung tissues from patients with lung fibrosis. SARS-CoV-2 infection likely activates TGF-β signalling, increases FN expression and results in lung fibrosis.
Materials and methods
Predicted SARS-CoV protein and ACE2 binding
Previous studies showed a conserved evolutionary relationship between SARS-CoV and SARS-CoV-2 [
14]. The S protein of SARS-CoV-2 and its predicted receptor, ACE2 were identified based on a public database using p-hipster as previously described [
17].
The interaction network of ACE2 genes and proteins
Predicted gene/protein interactions were obtained from online databases using bioinformatics analysis. We used GeneMANIA (University of Toronto) to generate an interaction network of
ACE2 and related proteins [
18]. Previous studies showed that TMPRSS2 cleaves ACE2 [
19], and multiple gene queries were chosen in humans for searching the gene network of these two molecules. The predicted genes that interacted with
ACE2 and
TMPRSS2 were listed using cytoscape analysis (GeneMANIA cytosacpe plugin).
A connective network of ACE2 protein and its functional interactions were obtained using STRING version 11.0 (ELIXIR Infrastructure) as previously described [
20]. Briefly, ACE2 and TMPRSS2 were used in main searching list name and organism was
Homo sapiens. We selected textmining, experiments, databases and co-expression as active interaction sources. High confidence was used as the interaction score and the disconnected nodes in the network were hidden to simplify the display.
Genotype-tissue expression (GTEx) pilot analysis
The GTEx project is an RNA-sequencing database of gene expression in different tissues [
21]. It links regulatory expression quantitative trait loci (eQTL) variants (gene expression) to tissues. A network of
ACE2,
TMPRSS2,
ADAM17,
TIMP3,
AGT,
TGFB1,
VEGFA,
CTGF and
FN1 genes across all human tissues were generated using GTEx Portal as previously described [
22].
Protein detection in lungs
Representative images of ACE2, TMPRSS2, ADAM17, AGT, TGFB1, VEGFA, CTGF and FN proteins in human lung tissues were obtained from the Human Protein Atlas database as previously described [
23]. The Tissue Atlas and Pathology Atlas database (version 19.3) was mined for the expression and localization of these proteins in the lung tissues by immunohistochemistry, and representative images were taken to show the localisation of the target proteins in lung tissues [
24].
Single cell analysis of human lung datasets
We analysed the expressions of ACE2, TMPRSS2, ADAM17, TIMP3, AGT, TGFB1, VEGFA, CTGF and FN1 in different lung cell populations using previously published human single cell RNA-sequencing datasets. All datasets were explored in the UCSC cell browser to identify the cellular sources of those genes in the airways or lung tissues.
In the first dataset [
25], human bronchial epithelial cells (HBECs) were obtained from endobronchial lining fluid by invasive bronchoscopy microscampling (
n = 4), and lung samples (
n = 12) were obtained by surgical intervention. Endobronchial lining fluid was collected from non-involved segment from the contralateral lungs of patients with lung cancer, and HBECs were isolated and grew in culture media [
25]. Lung tissues were obtained from lung cancer patients, and normal lung tissues was distant from the tumour area [
25]. These samples were snap-frozen by liquid nitrogen without direct touch and stored at − 80 °C. RNA-sequencing was performed using 10X Genomics Chromium platform of IIIumina HiSeq4000. In the second dataset [
26], single cells were isolated from cryobiopsy samples from one idiopathic pulmonary fibrosis (IPF) patient. In the third dataset, single cell samples were obtained from lung biopsies from donors with healthy lungs, but were dead with other diseases or accident (
n = 8), including stroke (one patient), intracranial haemorrhage (three patients), anoxic brain injury (three patients) and head trauma from gunshot wound (one patient) and patients with pulmonary fibrosis (
n = 8), including IPF (four patients), interstitial lung diseases (ILD, three patients) and hypersensitivity pneumonitis (one patient) [
26].
Cells were clustered using a graph-based shared nearest neighbor clustering approach and graphs were visualised using a t-distributed Stochastic Neighbor Embedding (tSNE) plot to identify the main cellular source of those genes in the airways or lungs.
Gene expressions in human epithelial cells treated with SARS-CoV-2
The gene expressions of
ACE2,
TMPRSS2,
ADAM17,
TIMP3,
AGT,
TGFB1,
VEGFA,
CTGF and
FN1 were from an existing RNA-sequencing dataset [
27] through Gene Expression Omnibus (GEO) database. The data were analyzed using Bioconductor in R (Bioconductor) as previously described [
28‐
30]. Briefly, in the GSE147507 dataset [
27], human adenocarcinoma alveolar basal epithelial (A549, 1 × 10
6) cells and HBECs (1 × 10
5) were infected with SARS-CoV-2 (deposited by the Centre for Disease Control and Prevention and obtained through BEI Resources) or media controls for 24 h and total RNA was extracted by TRIzol Reagent (ThermoFisher). RNA-seq libraries of polyadenylated RNA were prepared using the TruSeq RNA library Prep Kit V2 (Illumina) and RNA-seq libraries for total ribosomal RNA-depleted RNA were prepared using the TruSeq Stranded Total RNA library Prep Gold (Illumina).
Gene expression in human lung fibrosis datasets
We analysed the gene expression of ACE2, TMPRSS2, ADAM17, TIMP3, AGT, TGFB1, VEGFA, CTGF and FN1 in lung samples from pre-existing gene microarray datasets.
In GSE2052 dataset [
31‐
33], lung tissues were obtained from healthy controls (
n = 11) and IPF patients (
n = 13). DNA was isolated from lung histology for gene array analysis and data was profiled by an Amersham Biosciences Codelink uniset human bioarray.
In the GSE10667 dataset [
34‐
36], lung tissues were from lung healthy controls (
n = 15), ILD patients with usual interstitial pneumonia (UIP) histopathologic pattern but not IPF (other ILD,
n = 23) or IPF patients (
n = 8). Samples were obtained from surgical remnants of biopsies or lungs explanted from patients with IPF who underwent lung transplant. Control normal lung tissues obtained from the disease-free margins with normal histology of lung cancer resection specimens. Gene expression was profiled by Agilent-014850 Whole Human Genome Microarray 4x44K G4112F.
The Benjamini-Hochberg method for adjusted
P value/false discovery rate (FDR) was used to analyse differences between groups. Statistical significance was set at FDR < 0.05. Target gene expression was calculated as log
2 intensity robust multi-array average signals (Log
2 transformed intensity value) [
37].
Statistical analysis
Results are presented as mean ± standard error of the mean (SEM). Unpaired student
t-Tests were used to compare two groups in existing dataset analysis. A one-way analysis of variance (ANOVA) with Bonferroni comparisons was used to compare between multiple groups [
38]. All statistical analyses were performed using GraphPad Prism Software (San Diego, CA, USA) as previously described [
39].
Discussion
COVID-19 is a pandemic disease that is induced by SARS-CoV-2. As of 24th May, it has affected more than 5.2 million people across the world causing 337,000 deaths. Studies demonstrate that SARS-CoV-2 infection may result in the similar effects as SARS-CoV due to the similarity of their sequence [
14]. One of the major consequences of SARS is that patients develop lung fibrosis as a major sequela. Increasing studies show that COVID-19 patients have lung fibrosis [
9‐
12], however it remains unknown how SARS-CoV-2 infection induces this. ACE2 is a cellular receptor of SARS-CoV-2, and we have confirmed the potential binding relationship of ACE2 and SARS-CoV-2 using bioinformatic analysis in the current study. In addition, we also show that SARS-CoV-2 infection associates with increases of fibrosis-related gene transcription that induces lung fibrosis.
The baseline level of
ACE2 mRNA expression is very low in lungs compared to other organs. It is increased in alveolar epithelial cells after SARS-CoV-2 infection, indicating a positive correlation of ACE2 and SARS-CoV-2 infection. We have found that
ACE2 mRNA expression is mainly found in gastrointestinal (GI) tract and the small intestine has the highest level of
ACE2 levels compared to other organs in this study. Diarrhea is one of major symptoms of COVID-19 and high numbers of ACE2 positive small intestine cells occur in COVID-19 patients [
43]. This indicates that SARS-CoV-2 also may also affect the GI tract through the ACE2 receptor. It remains unclear how SARS-CoV-2 reaches the GI tract in COVID-19 patients. Possible routes are through infected food [
44] or transmission from the lung to the GI tract via the lung-gut axis [
45‐
47]. Live SARS-CoV-2 was detected in stool samples from patients who had respiratory issues but not diarrhea [
48], suggesting SARS-CoV-2 infection occurs through lung-gut axis. On the other hand, a recent study showed that three children had positive SARS-CoV-2 tests in their stools, but negative results in their throat swab samples, indicating the virus enters these patients via oral infection [
49]. The infection may also transmit from the gut to the lung [
50], causing a secondary infection [
39]. Respiratory and digestive systems are the two major pathways that SARS-CoV-2 enters the body. Thus, it has been recommended that routine stool testing should be performed in potential COVID-19 patients even after viral RNA clearance in their respiratory system [
51].
There is a high chance that COVID-19 patients potentially develop lung fibrosis, but how infection leads to fibrosis remains unclear. TGF-β is a cytokine that promotes the development of fibrosis. Active TGF-β regulates the level of ECM proteins, which are major factors involved in tissue remodelling and fibrosis [
42]. CTGF is another cytokine involved in the remodelling process and the induction of lung fibrosis [
52]. We find that
TGFB1 and
CTGF mRNA transcripts are significantly increased in alveolar epithelial cells after SARS-CoV-2 infection. FN is a major ECM protein that has critical roles in tissue remodelling and fibrosis [
53]. Our previous studies showed that increased FN deposition is linked with lung fibrosis [
42], and we show in the current study that increased
FN1 mRNA transcripts are present in lung tissues from lung fibrosis patients. Inhibiting a main functional domain of the
FN1 gene inhibits fibrosis features in an in vivo model of lung fibrosis [
54]. In this study, we found that SARS-CoV-2 infection induced
FN1 gene expression in alveolar epithelial cells, indicating the early induction of fibrotic processes and how the virus may be driving this.
ACE2 is cleaved by ADAM17 and/or TMPRSS2 before SARS-CoV-2 binds, and the cleavage of the receptor facilitates virus entry into host cells [
16]. These events may be self-promoting and the mRNA expression of
TMPRSS2 and
ADAM17 are increased in alveolar epithelial cells after SARS-CoV-2 infection. The enzyme activity of ADAM17 is inhibited and regulated by TIMP3 [
55], but SARS-CoV-2 reduces
TIMP3 mRNA expression in alveolar epithelial cells, that likely promotes greater ADAM17 activity in COVID-19 patients. TMPRSS2 and ADAM17 may compete for ACE2 cleavage, and processing by TMPRSS2 promotes more virus entry than that of ADAM17 [
19]. Thus, increased activity of these enzymes after SARS-CoV-2 infection may contribute lung fibrosis but this needs to be proven clinical and experimental studies.
Bronchial epithelial cells mount the initial response SARS-CoV-2, however we show that ACE2 mRNA levels are not changed in HBECs after infection compared to sham-infected controls. HBECs mount little response to infection compared to alveolar epithelial cells, and induces pneumonia, suggesting that SARS-CoV-2 infection directly induces disorders in parenchyma, including lung fibrosis. HBECs may respond to a higher inoculum of SARS-CoV-2 or in a shorter timeframe that require further experiment.
Abnormal tissue remodelling results in lung fibrosis and this process is currently irreversible [
56]. Pulmonary fibrosis patients have only an average 2–3 years survival of the after they have been confirmed with this lethal disease [
57]. Many lung fibrosis patients do not have major or previous symptoms, but have late stage lung fibrosis upon diagnosis. Thus, the most responsive treatment time may be missed. Early diagnosis is now considered critical but is a major challenge. Since COVID-19 patients may develop lung fibrosis [
9‐
12], early prevention and intervention may significantly reduce the number of lung fibrosis patients-induced by SARS-CoV-2 infection.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.