Background
Respiratory viral infections are associated with a robust immune response. Initial activation of the innate immune response leads to the release of cytokines and chemokines. Subsequent activation of the adaptive immune response results in the production of cytotoxic T-cells directed toward virus-infected cells and B-cells that produce pathogen virus-antibodies. Following the resolution of the infection, virus-specific antibodies and cytotoxic T-cells persist, but the acute immune response resolves within days or weeks after the virus is cleared [
1‐
3]. However, for chronic viral infections, the immune response persists, and T-cells can develop an exhausted phenotype.
Individuals infected with severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) often experience severe respiratory complications and other sequelae. SARS-CoV-2 infection results in dysregulation of the innate and adaptive immune response [
4,
5]. Acute infection is associated with T-cell depletion and exhaustion, which contributes to SARS-CoV-2 persistence. More severe clinical disease is associated with greater lymphopenia [
6], and recovery of lymphocyte counts precedes clinical recovery [
4]. Compared to other respiratory viral infections, the immune response to SARS-CoV-2 is characterized by robust production of proinflammatory cytokines but diminished interferon Type I and III responses [
7,
8]. Molecular and cellular immune features of 31 patients aged > 70 years with severe COVID-19 pneumonia have suggested that inflammation, coupled with the inability to have a proper anti-viral response, could aggravate disease severity and the worst clinical outcome [
9]. Comparative host transcriptome analysis across distant coronavirus genres showed 23 pathways and 21 Differentially expressed genes (DEGs) across ten immune response-associated pathways were shared by these viruses, and these DEGs could be utilized as specific targets for novel coronavirus treatments [
10].
Studies involving the convalescent period following acute viral/bacterial infections offer significant insights into disease pathophysiology, duration of immunity, host characteristics facilitating recovery, as well as susceptibility for recurrence/reinfection. In a prospective study evaluating transcriptomics of 1610 healthy subjects, 142 of whom developed an acute viral respiratory illness (influenza A, B, rhinovirus, or other) over a 2-year period, the infective phase (days 1–2) demonstrated a spike in interferon and innate immunity pathways, followed by a recovery phase characterized by transcripts implicated in cell proliferation and repair (days 4–6). By day 21, gene expression was indistinguishable from baseline in this study [
1]. In another study of patients who had recovered from Ebola virus infections, a small panel of genes identified via transcriptomics were predictors of outcomes and survival, independent of viral load [
11]. In a third study of convalescence, global transcriptome analysis identified diagnostic signatures for resolution and symptom persistence in Lyme disease [
12].
The post convalescent period of SARS-CoV2 infections is an area of active interest. In a recent cohort study of COVID-19 infected subjects, 90% of whom had mild illness or were asymptomatic, 30% eventually reported symptoms such as fatigue, loss of taste or smell or brain fog, and an overall decrease in health-related quality of life measures up to 6 months after the acute phase [
13]. Several groups have investigated the course of the antibody response of patients recovering from SARS-CoV-2 infections [
14], but little is known about the recovery of transcriptomic changes in this rather protracted post-acute period in large cohorts.
We profiled peripheral blood leukocyte gene expression in people who had been infected with SARS-CoV-2 and who had recovered and were donating COVID-19 convalescent plasma. Gene expression was analyzed using the nCounter platform, a robust tool to detect the expression of 800 genes in a single reaction with high sensitivity and linearity across a broad range of expression levels. This methodology bridges the gap between genome-wide (microarrays or RNA sequencing) and targeted (real-time quantitative PCR) expression profiling [
15]. Gene signatures identified on this platform have demonstrated clinical applicability in diagnostics [
16] and in understanding and predicting responses to therapeutic interventions [
17,
18]. Recently, the platform was utilized successfully to risk-stratify patients with active COVID-19 infections based on data from a small study [
19]. We sought to investigate the transcriptome of peripheral blood post-COVID-19 in the context of other demographic, clinical, and laboratory parameters. In this study, we evaluated the immune response in COVID-19 convalescent donors (CCD). Towards this goal, using the nCounter, we analyzed and compared the transcriptomes of 162 CCD and healthy donors (HD).
Methods
Human subjects and eligibility criteria
Between April-December 2020 (i.e., before the COVID-19 vaccination), 162 CCD and 40 healthy donor controls were enrolled prospectively in an IRB-approved protocol (Clinical Trials Number: NCT04360278) and provided written informed consent to participate in the study. Of the 162 CCD subjects, 93 subjects donated blood once, while 46 donated twice, 12 thrice, 6 four times, and 5 donated five times.
Eligibility criteria for CCD included (1) routine blood donor criteria, (2) molecular or serologic laboratory evidence of past COVID-19 infection, and (3) complete recovery from COVID-19, with no symptoms other than residual loss of taste or smell for ≥ 28 days, or ≥ 14 days with a negative molecular test after recovery and was considered as the first visit post convalescence. We collected donor demographic and biometric data, including age, race, sex, ABO blood type, body mass index, and complete blood counts at the first visit for each subject in the early convalescent period. For each CCD, clinical severity of past COVID-19 infection was categorized as asymptomatic, mild (self-limiting course, symptomatic management at home), moderate (emergency room management or hospitalization), or severe (ICU admission). In all cases, anti-SARS-CoV-2 testing was performed. The minimum interval between plasma donations was 28 days; shorter intervals were acceptable between sample draw visits. Routine plasma donor testing was performed, including standard infectious disease testing, blood group assessment, and human leukocyte antigen antibody testing in female donors. Healthy donor control samples were obtained from research donors (protocol 99-CC-0168) who previously provided consent for the collection of research blood samples and had self-reported to be negative for SARS-CoV-2 exposure.
Anti-SARS-CoV-2 testing was performed using the Ortho-Clinical Diagnostics VITROS® Total (IgA/G/M) and IgG COVID-19 Antibody tests, as well as the SARS-CoV-2 neutralizing assay (NIH/National Institute of Allergy and Infectious Diseases (NIAID) Integrated Research Facility at Fort Detrick, Maryland, USA) as previously described [
20].
RNA isolation
Five to ten milliliters of human whole blood samples were collected in EDTA-anticoagulated tubes (BD) and centrifuged at 2500 RPM for 15 min. The supernatant plasma was separated for the antibody and multiplex immunoassays. ACK lysis buffer (Quality Biological) was added to the leftover pellet in a 1: 9 concentration, mixed several times, and incubated at room temperature for 15 min. Subsequently, the tubes were centrifuged at 1500 RPM for 10 min, and the supernatant was discarded. The pellet was washed twice with 1XPBS (KD Medical). 700 µL QIAzol lysis reagent (Qiagen) was added to the pellet with mixing and stored at −80 °C. Using the RNeasy Mini Kit (Qiagen), RNA was eluted in 40 µL of Milli-Q water. Following quality (Agilent 2100 Bioanalyzer) and quantity (Nanodrop One, Thermo Scientific) checks, the RNA was stored at −80 °C for further transcriptomic profiling.
Nanostring nCounter transcriptomic profiling
Nanostring transcriptomic profiling was performed using the nCounter
® Human Host Response (Additional file
1: Table S1) and the nCounter
® Human TCR diversity panels (Additional file
2: Table S2). Whole blood total RNA (100 ng) was hybridized to reporter and capture probes at 65 °C for 16 h using a thermal cycler (Veriti Applied Biosystems). These hybridized samples were loaded onto the nCounter cartridge, and the post hybridization step and scanning were performed on the nCounter Prep Station and Digital Analyzer.
Multiplex immunoassay
In a subset of CCD samples with highly perturbed gene expression and in healthy donor controls, we performed cytokine analysis. According to the manufacturer's instructions, a multiplex biometric immunoassay was performed to assess 48 cytokine and chemokine cell signaling molecules (Bio-Plex Human Cytokine Assay; Bio-Rad Inc., Hercules, CA, USA) [
21]. The quantified cytokines included interleukins (IL-1α, IL-1β, IL-1Ra, IL-2, IL-2Rα, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10, IL-12p40, IL-12p70, IL-13, IL-15, IL-16, IL-17A, & IL-18), interferons (IFN-α2 & IFN-γ), tumor necrosis factors (TNF-α, TNF-β, & TRAIL), growth factors (SCF, FGF, β-NGF, HGF, LIF, PDGF-BB, VEGF, SCGF- β, G-CSF, M-CSF, & GM-CSF), and chemokines (CCL247CTACK, eotaxin, GRO-α, CXCL10/IP-10, CCL2/MCP-1, CCL7/MCP-3, MIF, MIGCCL3/MIP-1α, CCL4/MIP-1β, CCL5/RANTES, & SDF-1α). A multiplex array reader from Luminex™ Instrumentation System (Bio-Plex 200 system) was used to determine the cytokine levels. The Bio-Plex Manager Software was used to calculate the cytokine concentrations.
Data processing for nCounter host response panel
All statistical analyses were performed using R (Version 4.1.1). Raw counts were normalized by scaling each sample by its geometric mean of the panel’s 12 housekeeping genes. Of the 270 samples from CCDs, we removed 2 samples with low signal strength, defined as low outlier values of the housekeeper geometric mean. The normalized data were then log2-transformed. Healthy donor samples were compared to CCD across 4 time windows: 26–89 days, 90–119 days, 120–149 days, and 150–241 days post-symptoms-onset. Within each window, a linear mixed model was fit to each gene’s normalized log2-transformed expression. The model treated CCD/healthy donor status, age, sex, and race as fixed effects and patient ID as a random effect. No patients had multiple samples within the 120–149 day window, so in this window, a linear model was fit with no random effects. The R library lmerTest was used to fit mixed models, and the R function lm was used to fit linear models. For each window, all genes’ p-values were converted to False Discovery Rates using the Benjamini–Hochberg procedure, using the R function p.adjust.
Classification of highly perturbed samples
To calculate perturbation scores, we began by standardizing the data to give each gene mean 0 and standard deviation 1 within the healthy donor samples. We then defined a perturbation score as each sample’s Euclidean distance from the mean healthy donor sample in this standardized expression data. To define “highly perturbed” samples, we used the R library Mclust to cluster perturbation scores into two clusters, one high and one low. The highly perturbed samples were clustered into groups P1 and P2 by applying the R functions hclust and cutree to their log2-transformed normalized expression data.
Analysis of nCounter TCR diversity panel
TCR diversity scores were calculated using the Rosalind nCounter TCR Diversity Report, a software tool designed specifically for the nCounter TCR diversity panel. The software calculates the Shannon diversity index for each sample’s TCR gene counts. Gene expression values are first normalized to a “panel standard” reference sample to remove variability due to batch effects. TCR diversity scores were analyzed using the same statistical models applied to gene expression values.
Discussion
We evaluated the transcriptome of peripheral blood leukocytes from people who had recovered from COVID-19 and donated convalescent plasma. At the time of the donation, the CCD had no COVID-19 symptoms, tested negative for SARS-CoV2, and were considered healthy because they passed a blood donor health history questionnaire. The CCD differed from healthy donors in several respects. When compared to healthy donors, CCD had significantly higher leukocyte, lymphocyte, and monocyte counts (early on in convalescence), as noted previously with other viral infections as well [
45,
46]. More importantly, CCD demonstrated significant differences in peripheral blood leukocyte transcriptomes. In a subset of CCD with highly perturbed transcriptomics, cytokine levels were also abnormal in PBMC samples collected months after convalescence. These results suggest that the immune dysregulation occurring during acute infection in COVID-19 persists for several months post-infection.
Our study is unique in that we analyzed convalescent donors over a long period. Some studies have evaluated people serially with SARS-CoV-2 and found persistent changes in cellular immunity, but only studied patients for 6 to 10 weeks following resolution of COVID-19 [
47,
48]. Our longitudinal assessment of PBMC samples from CCD identified unique transcriptomic trends. The CCD samples were collected at various time intervals following the diagnosis of COVID-19. The samples were collected from a few weeks to more than 6 months post-symptom resolution. While we found some differences in gene expression among CCD at all time intervals, the nature of transcriptomes varied with time. Interestingly, when compared to healthy subjects, the number of differentially expressed genes increased over time, peaked at about 120 to 150 days post-symptom resolution, and then fell during the remainder of the study period.
The function of the differentially expressed genes also changed with time. Initially, less than 90 days post-symptom resolution, genes in interferon signaling, TNF signaling, and cell exhaustion pathways were expressed at high levels in CCD. Later, as the expression of CTLA-4, an inhibitor of T-cell function and marker cell exhaustion, fell in CCD leukocytes, the expression of genes in TGF-β signalizing, TNF signaling, IL-6 signaling, and myeloid activation increased in CCD leukocytes. After 120 to 149 days post-symptom resolution, the number of differentially expressed genes fell, but the proinflammatory genes OSM, PTGS2, and IL1B remained up-regulated. The expression of immunological checkpoint inhibitor CTLA4 is enhanced on the surface of T-cells due to induction of INF-γ production by neutrophils and monocytes, which are abundant in the peripheral blood of people with COVID-19 [
49]. An earlier analysis of publicly available transcriptomic databases found that the number and intensities of these inhibitory receptors were higher in SARS-CoV-2 infections compared to SARS-CoV-1, influenza, and respiratory syncytial virus infections [
50]. Besides CTLA4, an increase in activated CXCR4 + T cells homing to the lungs is associated with fatal COVID-19 [
51]. Hou et al. identified a significant enhancement of the expression of inhibitory receptors, which included CTLA-4 on SARS-CoV-2–specific CD4 + T cells (suggesting an exhausted phenotype) even though the quantity of SARS-CoV-2–specific CD4 + T cells in convalescent COVID-19 patients was maintained after a year of recovery [
52]. In convalescent subjects with mild/moderate symptoms, 27–47 days after symptom onset, the T-cell differentiation regulation and memory T cell-related gene
CXCR4 were upregulated along with
FOS,
JUN,
CD69, and
CD83 [
53]. Hence, both altered
CTLA4 and
CXCR4 expression levels may play a critical role in the severity and fatality of the SARS-CoV-2 infection, as well as during convalescence.
OSM,
CXCL2, and
CCL3/CCL3L1/CCL3L3 (jointly measured with a single probe) initially have wide expression ranges spanning from the normal range of healthy donor samples to greater than 16-fold increases from the healthy donor mean. By 200 days, these extreme over-expression values are no longer observed, and these genes’ mean expression returns to the healthy donor mean. The
OSM gene encodes the protein Oncostatin M, a pleiotropic cytokine that stimulates IL-6. Circulating OSM positively correlates with COVID-19 severity. IL-6, a proinflammatory cytokine, drives immune dysregulation and respiratory failure leading to higher mortality [
54,
55]. The chemokine CXCL2 is critical for macrophage, monocyte, and neutrophil migration and is also known to facilitate the clearance of SARS-CoV2 in the absence of CD4 + and CD8 + T cells or neutralizing antibodies beyond 12 days of infection [
56]. The
CCL3L3 gene encodes the CCL3 protein (MIP-1), one of the chemokine families with diverse functions based on the C–C motif. CCL3 is a neutrophil chemotaxis protein that acts as a ligand for CCR1, CCR3, and CCR5. Neutrophils play a significant role in COVID-19 severity, as CCL3 is upregulated in severe COVID-19 [
57] [
58].
IL1B shows a similar pattern, but it did not return to the healthy donor average by 200 days. Interleukin (IL)-1β, a potent proinflammatory cytokine, plays a significant role in the host defense response to infection and injury. Studies have shown elevated levels of IL1β during COVID-19 infection [
59]. Additionally, the IL1 family of cytokines plays a key role in inducing cytokine storm in poorly controlled COVID-19 infection. Furthermore, in a recent clinical study of 88 hospitalized subjects with SARS-CoV-2 infection, blockage of IL1β with canakinumab demonstrated better clinical outcomes [
59].
IFNA6 and
HERC5 both show consistent down-regulation from healthy donors, but with high outliers > fourfold above the healthy donor mean. These genes remain suppressed below the healthy donor average beyond 200 days. Type I interferon subtype IFNA6 has been reported in patients infected with COVID-19 in the context of platelet degranulation and B cell maturation [
60‐
62]. E3 ligase HECT and RCC1-containing protein 5 (HERC5) regulate interferon-stimulated gene 15 (ISG15) signaling in response to SARS-CoV-2 and other viral infections [
63].
A subset of “highly perturbed” CCD had more marked changes in gene expression. These gene expression changes in the perturbed CCD seemed transient. Of the 21 patients with a “highly perturbed” sample, 11 had multiple timepoints collected. Among these 11 individuals, only 1 was highly perturbed at multiple timepoints, transitioning from cluster P2 at 88 days to cluster P1 at 117 days. A subgroup of these perturbed donors had gene expression changes showing interferon production and innate immune system activation, lower levels of anti-COVID antibodies and increased TCR diversity.
It is unclear why immune changes were found in CCD up to 6-months post-symptom resolution. However, it has proven difficult to find the immunological "bridge" that connects acute COVID-19 and post-COVID-19 syndrome [
64,
65]. Careful annotation of the clinical symptomatology is a crucial step in understanding the pathophysiology of the post-COVID syndrome. It may be possible to separate disease drivers by separating residual symptoms of the acute disease site from new symptoms that may develop after the acute disease recovery. Moreover, confounding factors may also include post-traumatic stress disorder (PTSD)-related elements, which can make it difficult for patients to accurately assess their own clinical symptoms and necessitate comprehensive neuropsychiatric assessments [
65]. Additionally, persistence of SARS-COV-2 has been detected by RT-PCR in respiratory specimens for approximately 2 to 3 weeks post-infection, in some cases 4 to 8 weeks [
66]. SARS-CoV-2 can be detected in feces for a longer period than in respiratory specimens [
66,
67]. One study found that SARS-CoV-2 could be detected in respiratory samples for a median of 14 days and in feces for a median of 19 days [
68]. Another study found that it could be detected in feces for 10 weeks [
67]. The persistent shedding of SARS-CoV2 is not thought to be due to reinfection but is more likely the result of release of sequestered virus or mutation of the original virus. It is also possible latent virus is reactivated. However, persistent SARs-CoV2 has not been detected 6-months post-symptom resolution.
The presence of prolonged changes in immune cell transcriptomes post-COVID-19 is consistent with other studies reporting prolonged symptoms in people who have recovered from acute infections. Many people experience post-acute sequelae of COVID-19 (PASC) which is also known as long COVID or long haulers syndrome. These people experience fatigue, tiredness, dyspnea, shortness of breath, chest pain, joint pain, and perceived cognitive impairment. One study found that 93% of people hospitalized for COVID-19 experienced PASC [
69]. In the same study, among people who had visited a clinic, 55% had at least one of these persistent symptoms 25 to 89 days post-diagnosis, and 67% had at least one persistent symptom 90 to 174 days post-diagnosis. After 175 days, 64% of people experienced symptoms [
69]. It is possible that our study included CCD with these symptoms. All the CCD were required to pass a blood donor health history screen and to have had a normal body temperature to donate. However, the blood donor history screen is somewhat generic, and it is possible that some donors had the somewhat non-specific symptoms of PASC, which were not captured during the health screen.
Post-acute sequelae of COVID-19 have some similarities to chronic fatigue syndrome, which is characterized by fatigue, depression, memory loss, and discomfort. Inflammatory reactions and elevated cytokine levels likely contribute to some of these symptoms. Cytokines found elevated in some chronic fatigue syndrome patients include interferon-γ, IL-6, IL-1, IL-2, and TGF-β [70]. Our study found that people who had recovered from COVID-19 were afebrile, relatively healthy but still had elevated cytokine and chemokine gene expression levels and well as increased cytokine expression throughout the 6-month study period, which suggests that immune dysregulation and immune system activation may be responsible for PASC. Consistent with our findings was another recently published report of persistent immunological dysfunction characterized by elevated proinflammatory cytokine (IFN-β, IFN-λ1, IFN-γ, CXCL9, CXCL10, IL-8, and sTIM-3) levels up to 8 months after mild-moderate COVID-19 infection. Furthermore, these were elevated in individuals with or without clinically identifiable long COVID syndrome when compared to individuals who were infected with other (non-COVID) prevalent coronaviruses or in unexposed healthy control groups [
71].
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.