Background
Coronavirus disease 2019 (COVID-19) is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a highly infectious respiratory virus responsible for the ongoing global pandemic. COVID-19 usually presents as an asymptomatic or mild to moderate respiratory infection in previously healthy individuals with symptoms that include fever, cough, headache, fatigue, myalgia, diarrhoea, and anosmia [
1,
2]. However, in older individuals or in those with prior co-morbidities such as obesity or cardiovascular disease, COVID-19 can quickly develop into a severe and life-threatening disease requiring urgent intensive care support. While the death toll from COVID-19 has been devastating (> 4.8 million as of 5 October 2021 according to the Johns Hopkins University Coronavirus Resource Center [
3]), the vast majority of those infected fortunately do recover, with case fatality rates in most countries falling below 3%. It is now increasingly clear, however, that recovered individuals, even those who had mild COVID-19, can suffer from persistent symptoms for many months after infection [
4], which is commonly referred to as long COVID. For example, a cohort study of COVID-19 patients (median age 57) discharged from hospital in Wuhan, China, 6 months prior, reported that 63% of patients presented with fatigue or muscle weakness, 23% sleep difficulties, and 23% anxiety or depression [
5]. Individuals who were previously severely ill during their hospital stay have ongoing impaired pulmonary function and abnormal chest imaging. Similar reports continue to pour in from around the world [
6‐
11]. While the majority of these reports involve patients who were hospitalised with COVID-19, persistent, albeit milder and less-frequent, symptoms have also been reported in non-hospitalised individuals months after recovery [
12]. These reports resemble similar post-infectious syndromes after other infections, such as Ebola [
13] and SARS-CoV-1 [
14], and suggest that there may be a long-lasting dysregulation of the immune response in individuals recovering from COVID-19.
Flow cytometric analysis of peripheral blood samples collected from convalescents in the USA (median 29 days post-infection) has revealed altered frequencies of innate and adaptive immune cell populations including CD4
+ and CD8
+ T cell activation and exhaustion marker expression in recovered individuals [
15]. A similar study in Singapore (median 34 days post-infection) found increased levels of circulating endothelial cells and effector T cells in those recovering from active disease [
16]. Single-cell RNA sequencing (scRNA-Seq) of peripheral blood mononuclear cells (PMBC) from a small (
n = 10) cohort of patients that were 7–14 days post-recovery also found an increased ratio of classical CD14
+ monocytes with high inflammatory gene expression, decreased CD4
+ and CD8
+ T cells, and significantly increased plasma B cells [
17]. scRNA-Seq profiling of PBMC gene expression in a larger cohort of recovering individuals (
n = 95) found those with severe disease (
n = 36) had decreased plasmacytoid dendritic cells (pDCs) and increased levels of proliferative effector memory CD8
+ T cells, relative to healthy controls [
18]. A potential limitation of this study, however, was that samples from recovered individuals were not collected at uniform timepoints during recovery, instead samples were collected between 9 and 126 days post-infection (on average 44.5 days). Longitudinal profiling of the transcriptome of PBMC collected from individuals (
n = 18) during treatment, convalescence, and recovery phases of infection (up to 10 weeks post-infection) revealed that relative to acute disease, recovery from COVID-19 was marked by decreased expression of genes involved in the interferon response, humoral immunity, and increased signatures indicative of
T cell activation and differentiation [
19]. However, these responses were not compared with healthy controls. Another recent study longitudinally profiled immune cell populations and the blood transcriptome in > 200 SARS-CoV-2-infected patients over 12 weeks from symptom onset to recovery [
20]. They compared the blood transcriptome in 2-time bins (0–24 and 25–48 days from symptom onset) and found substantial changes relative to uninfected controls in immune cell populations and increased expression of genes involved in immunometabolism and inflammation, which persisted after infection.
Here, we have performed anti-receptor-binding domain (RBD) and anti-Spike serology, comprehensive multi-parameter immunophenotyping, and transcriptome-wide RNA sequencing on blood collected from individuals recovering from mild/moderate or severe/critical COVID-19 at 12, 16, and 24 weeks after their first positive SARS-CoV-2 PCR test, as well as age-matched healthy controls (HCs). Our analyses reveal robust but heterogenous humoral immunity in convalescents until at least 6 months post-infection. Deep immunophenotyping highlighted profound changes in immune cell populations in COVID-19 convalescents compared with HCs, particularly at 12 and 16 weeks post-infection (wpi). Furthermore, RNA sequencing revealed significant changes in whole blood gene expression for up to 24 wpi, even in individuals that had mild disease without hospitalisation. Significant differences in gene expression were also identified at 24 wpi in convalescent individuals who were referred to a long COVID clinic compared to those who were not. These data suggest that SARS-CoV-2 infection leads to persistent changes to the peripheral immune system long after the infection is cleared, which has important potential implications for understanding symptoms associated with long COVID. These changes to the peripheral immune system could have implications for how individuals recovering from infection respond to vaccination or other challenges encountered in this period and persistent immune activation may also exacerbate other chronic conditions.
Discussion
Recovery from SARS-CoV-2 infection is frequently associated with persistent symptoms months after infection including fatigue, muscle weakness, sleep impairment, and anxiety or depression [
4,
5,
64]. These data suggest ongoing immune dysregulation in COVID-19 convalescents which has been supported by several recent studies profiling the immune system in individuals recovering from COVID-19 using multi-parameter flow cytometry, bulk and single-cell transcriptomics, and other approaches [
15,
16,
65‐
70]. Our study extends on these recently published studies, which have mostly assessed immune responses at 2–12 weeks post-infection. Here, we report an integrated analysis of immune responses at a transcriptional, cellular, and serological level, in individuals recovering from mild/moderate or severe/critical COVID-19 at 12, 16, and 24 weeks post-infection, in comparison to age-matched HCs.
Anti-Spike and anti-RBD serology data demonstrated heterogeneity of antibody responses to SARS-CoV-2 consistent with previously published reports showing long-lasting IgG and IgG1 antibody responses to at least 6 months post-infection which were correlated with disease severity [
36,
71,
72]. Our cohort is particularly well-suited to the assessment of the durability of antibody responses due to the negligible risks of re-infection in South Australia where, due to strict border restrictions and public health measures, community transmission was eliminated during the sample collection period. Despite the anticipated decay in IgA and IgM [
73‐
75], a large percentage of convalescents remained seropositive for both RBD- and Spike-specific Ig (all isotypes) for the duration of the study. This decay was less pronounced at 24 wpi in the severe COVID-19 convalescents compared to the mild cohort, with significant differences in RBD-specific IgM and IgG3 isotypes between the two groups. Recently, declining levels of SARS-CoV-2 Spike-specific IgM in mild COVID-19 convalescents were found to strongly correlate with serum virus neutralisation activity [
76], findings that were further confirmed in experiments with purified IgM fractions and IgM-depleted sera from similar patients [
40,
77]. In COVID-19 convalescents, IgM, similarly to IgG1, preferentially targets the S1 domain of the Spike protein [
78], the region that contains the RBD and N-terminus domains and the target of most neutralising antibodies and regions of high interest for developing passive immunotherapies to deal with new SARS-CoV-2 variants of concern [
79]. Conversely, less abundant SARS-CoV-2-specific IgG3 targets the S2 domain more efficiently [
78], which suggests that its ability to neutralise the virus is, by comparison, reduced. Yet, S2 contains the sequences that allow SARS-CoV-2 membrane fusion with the cell host membrane, a key step in virus entry [
2]. In fact, the ability of antibodies targeting S2 regions involved in membrane fusion to block Spike protein-mediated cell-cell fusion has been confirmed experimentally [
80]. In the future, it will be necessary to elucidate the particular roles of IgM and IgG3 in neutralising SARS-CoV-2 but, perhaps too, blocking virus infection by other mechanisms such as blockade of membrane fusogenic regions of the Spike protein. This will provide further insights into the overall importance of specific Ig isotypes in determining disease severity and outcomes.
In addition to our serological analysis of COVID-19 convalescents, we extensively and longitudinally profiled immune cell populations in the same individuals using a multi-panel approach that enabled the identification and enumeration of ~ 130 different sub-populations including deep phenotyping of the CD4 and CD8 compartments. Differences in immune cell populations compared with HCs were most strongly evident at 12 wpi, but some populations were still significantly different at 24 wpi. CD56
++ NK cells, granulocytes, LD neutrophils, and tissue-homing CXCR3
+ monocytes were significantly increased in convalescents at 12 wpi. Many of these changes persisted until at least 16 or 24 weeks. Consistent with our data, increased NK cells [
65] and granulocytes [
68] have been reported in other cohorts of convalescents and scRNA-Seq has revealed that increased non-classical monocytes are associated with more severe disease during active infection [
81]. In contrast to our study, a study of 109 Austrian convalescents at 10 weeks post-infection did not find neutrophils, monocytes, CD3
+ T cells, CD56
+ NK cells, or CD19
+ B cells to be significantly different in convalescents [
68]. Other studies have also reported significant decreases in the frequencies of invariant NKT and NKT-like cells [
66], which we and others [
20] did not observe.
Several previous studies have reported that T and B cell activation/exhaustion markers remain elevated following SARS-CoV-2 infection [
15]. Furthermore, CD4
+ and CD8
+ EM T cells have been reported to be significantly higher in convalescents at 10 wpi [
68]. Consistent with reports in active infection and convalescence [
15], convalescent individuals in our study had lymphopenia until at least 16 wpi; however, CD3
+ T cells were significantly increased at 12 wpi. We also observed significantly increased CD19
+ B cells at 12 and 16 wpi and CD38
+CD27
+ memory B cells at 16 wpi in convalescents. Recent studies have shown that increased activation and exhaustion of memory B cells observed during COVID-19 correlates with CD4
+ T cell functions [
82], and consistent with this, we observed reduced CD4
+ EM cell proportions in COVID-19 convalescents at 12 wpi. We were particularly interested in the role of regulatory T cells (Tregs) in COVID-19, as there have been conflicting reports of Tregs being either increased or decreased in convalescents. Significantly increased Foxp3
+ Tregs were observed in 49 convalescents from Wuhan at ~ 112 days post-recovery [
66]; however, another study observed that CD25
+Foxp3
+ Tregs were significantly reduced 10 weeks after COVID-19 [
68]. A more recent study has also reported that Tregs in severe COVID-19 patients have a distinct transcriptional signature with similarities to tumour-infiltrating Tregs, which persist in convalescent patients [
83]. We observed no significant difference in the total (CD4
+CD25
+CD127
low) Treg pool at any timepoint, but when we interrogated Tregs for their memory/maturation status, we observed that the naïve and TEMRA Treg proportions were significantly increased at 12 and 16 wpi, while EM and CM Tregs were significantly reduced, mirroring a similar reduction in the proportion of CD4
+ EM and CM pools at 12 and 16 wpi. Interestingly, a number of the Th lineage subsets including Th2, Th22, Th2/22, and Th17 had increased proportions of CM vs EM, revealing subtle skewing of the Th memory formation. The expansion of naïve Tregs could be an attempt to restore the balance in the Treg pool in the face of both inflammation and tissue damage, which is supported by emerging evidence of a dual role for Tregs in suppressing immune responses and promoting tissue repair [
84]. Increased TEMRA Tregs, which are often associated with exhaustion, but are in fact a poly-functional effector Treg population with characteristics of cytotoxic cells, migratory T cells, and tissue repair cells [
85,
86], further suggest a competition between classical immune suppression and tissue repair by these cells in response to tissue damage in COVID-19 convalescents.
Each Th subset has a paired regulatory subset [
41], and this includes Tfh subsets, as B cell help in germinal centres also requires regulation in the steady state [
87]. In a stereotypical antiviral immune response, Th1 cells migrate to sites of viral infection to establish an adaptive response, and regulatory cells co-migrate to limit chronic inflammation once the pathogen levels decline; however, there is an emerging function of tissue-resident Treg cells in tissue repair [
84,
88]. We did not observe increased Th1 cells, but we did observe a reduction of Th9 cells potentially suggesting a diversion of Th9 cells to other sites. We also observed that the maturation of Th pools was enhanced in both Th17 and Th22 subsets, where CM marker proportions were increased at all timepoints post-infection. This may suggest that epithelial homing and tissue damage trigger activation and form part of the COVID-19 T cell recall response. It is intriguing that the Treg partners of these lineages, including ThR2, ThR22, and ThR2/22 were all significantly reduced over the same time course post-infection, suggesting that the signal recruiting Th cells to tissue locations are persistent long after COVID infection. A similar imbalance in follicular help vs follicular regulation was also observed, whereby Tfh1 and Tfh2/22 cells were significantly elevated post COVID-19, but total TfhR, TfhR2, TfhR22, and TfhR2/22 cells were reduced. Other studies have demonstrated that CXCR5
+ Tfh populations are significantly elevated in individuals recovering from COVID-19 and correlate with robust humoral immunity [
89]; however, this previous study did not analyse the regulatory arm in this compartment. Another previous study has reported a decline in Tfh cells at 4 months post-infection [
74]. Interestingly, another previous study has suggested that germinal centre formation is impaired in acute infection [
90]. This previous study was based on the analysis of post-mortem lymph nodes and spleen in patients that succumbed to SARS-CoV-2 infection, whereas, in our study, we have assessed antibody responses in convalescent survivors, who clearly have strong humoral responses. Our data would suggest germinal centre formation is sufficient in convalescents.
In addition to immunophenotyping by flow cytometry, we performed RNA sequencing of total RNA from 138 blood samples collected from convalescent individuals at 12, 16, and 24 wpi, as well as HCs. To our knowledge, no other study has profiled transcriptome-wide changes in COVID-19 convalescents for such a long period post-infection. We found that the blood transcriptome of convalescents was significantly perturbed compared to HCs, with the largest numbers of DEGs being identified at 12 wpi. Transcriptional dysregulation persisted until at least 24 weeks. There was a very strong enrichment for pathways and BTMs related to transcription, translation, and ribosome biosynthesis among genes upregulated in recovering individuals, at all 3 timepoints. Many viruses upregulate rRNA synthesis during infection [
42,
43], but why rRNA gene expression remains upregulated months after infection is currently unknown. Other statistically enriched pathways among upregulated genes included neutrophil degranulation, antimicrobial peptides, immune system, and pathways related to other viral infections. These data suggest ongoing inflammatory responses and immune dysregulation in COVID-19 convalescents weeks-to-months after infection. Consistent with these data, neutrophil degranulation has reported to be significantly upregulated in active infection [
91,
92], suggesting that certain signatures of active infection persist well into convalescence. We also found evidence for dysregulated expression of genes involved in oxidative phosphorylation, a signature which has also been identified in one other recent study of convalescents to occur irrespective of whether elevated inflammatory markers persist or not [
20], but whose functional significance is currently unknown. Interestingly, mitochondrial dysfunction in PBMC has previously been associated with cognitive impairment in other contexts [
52]. This warrants further investigation given the frequent reports of cognitive issues in long COVID sufferers.
While some changes in gene expression were associated with variation in specific immune cell populations between individuals, differences in gene expression were not solely explained by changes in the frequency of any single immune cell population. A patient-specific analysis of the gene expression activity of pre-annotated BTMs enabled a more thorough assessment of the variation in gene expression responses. There was a broad spectrum in the recovery of gene expression responses in both mild/moderate and severe/critical convalescents. Variation in the rate of recovery from infection at a cellular and transcriptional level may explain the persistence of symptoms, such as fatigue, associated with long COVID in some convalescent individuals. We observed a strong association between our transcriptional signature of convalescence and referral to a dedicated long COVID clinic. While the majority of convalescent individuals in this cohort returned to a transcriptional baseline by 24 wpi, those referred to a long COVID clinic did not. More than 400 genes were identified to be differentially expressed in those convalescent individuals referred to a long COVID clinic compared to those convalescents who were not. Interestingly, these differences were only evident at 24 wpi, suggesting that while transcriptional dysregulation in many convalescents begins to resolve around 6 months post-infection, it persists in those individuals suffering from long COVID symptoms.
Of particular interest given known associations with symptoms such as fatigue, we identified several transcriptional signatures among long COVID convalescents that suggested a mild thrombocytopenia. There was, for example, a very strong enrichment for platelet-related pathways among downregulated genes and cell type enrichment analysis revealed a strong downregulation of platelet and megakaryocyte gene sets among individuals referred to a long COVID clinic. Consistent with our data, there are reports of thrombocytopenia in COVID-19 patients [
57‐
59]. Furthermore, SARS-CoV2-2 infection has also been shown to induce changes in platelet gene expression and function [
93,
94]. Unfortunately, we did not measure platelet levels in these individuals, so this is something that requires further assessment in future studies. Interestingly, a link between gene expression in peripheral blood and fatigue following infectious mononucleosis has been previously reported [
95], with at least some of the same genes differentially expressed in COVID-19 convalescents. These data may point towards common mechanisms regulating long COVID and post-viral infection fatigue more generally. Finally, we also uncovered significant inverse correlations between dysregulated BTMs and anti-Spike and anti-RBD antibody responses suggesting that prolonged transcriptional dysregulation may be associated with reduced antibody responses with potential consequences for the durability of protective immunity. Further work is now needed to assess whether dysregulated immunity following COVID-19 has implications for responses to other infections, vaccination, or in the management of chronic diseases.
While our study provides a high-resolution, multi-level insight into the immune dysregulation experienced post COVID-19, we recognise that our study also has some important limitations. While comparable to or larger than most other studies to date, the sample size is still relatively limited, particularly in the case of patients with more severe disease. This is particularly important given the apparently highly heterogenous recovery in immune dysregulation over time. Further larger studies will be needed to more fully assess differences due to disease severity, treatment, and other confounders and validate that the observed transcriptional changes are reflected at the protein level. Other single-cell approaches may also provide further resolution of the immune dysregulation experienced by convalescents and the transcriptional signatures we find associated with long COVID. We chose to perform the high-resolution immunophenotyping on freshly isolated PBMC in order to enrich for the rare lymphocyte subsets that are functionally important but not found in large number in whole blood, and this has some limitations when calculating the proportion of all cells found in the blood. Similarly, there are known biases introduced due to the removal of mature granulocytes from whole blood. Importantly, this is an approach that has also been successfully applied to other published COVID-19 cohort studies [
20,
65]). It is important to acknowledge the limitations associated with examination of cell proportions versus absolute cell counts, specifically that a lowered proportion does not always equate to a lowered absolute cell count. With regard to presenting the data as absolute cell counts or proportion of a reference cell pool, we selected a proportion analysis to reflect changes in the balance between multiple rare but clinically important lymphocyte subsets using parameters such as maturation status or homing potential. In addition, we have normalised the staining protocols to a fixed PBMC count (5 × 10
5) at input for each sample, to minimise batch effects or donor cell count differences, ensuring the data are comparable between multiple donors over multiple time points.
While our flow cytometry analyses enabled the assessment of ~ 130 parameters, it did not include markers for dendritic cells (DC), which have been found to be altered in COVID-19 convalescents in previous studies [
96]. Our BTM analysis, however, supports the dysregulation of DC populations in convalescents. Finally, while we assessed the relationships between immune dysregulation and anti-Spike and anti-RBD antibody responses, we did not assess T cell immunity in our study [
97,
98]. Further studies should also assess the effects of SARS-CoV-2 variants on long-term immune dysregulation in convalescents and comparative studies assessing differences between post-infectious immune dysregulation following SARS-CoV-2 infection in comparison to other infections would be highly beneficial. Due to the global impact of the pandemic, multiple protocols for separating and analysing the immune compartment have been used in multiple studies, and we acknowledge the limitation that in order to directly compare data between multiple cohorts, an international clinical protocol would need to be established with standardised cohort clinical inclusion criteria, standardised cell isolation and flow cytometry protocols, and standardised data analysis.