Background
Seasonal influenza vaccination is widely used but has low effectiveness despite annual revaccination [
1]. When the main circulating influenza strains have drifted away from those included in that season’s vaccine, effectiveness can be very low [
2]. Live attenuated influenza virus vaccines have become the preferred vaccine for use in children due to the ability to induce T cell responses to the influenza virus as well as humoral responses to the external glycoproteins, but recent findings of low vaccine effectiveness in the US [
3,
4] have resulted in the recommendation to use inactivated vaccines only in the US this season. Seasonal influenza vaccines cannot provide protection against influenza pandemics caused by novel subtypes, and much research effort has been directed towards producing a ‘universal’ influenza A vaccine that will protect against any subtype of influenza A. This has resulted in a number of different approaches to vaccination against influenza which have entered early phase clinical development [
5‐
8]. Seasonal influenza vaccines are licensed based on their ability to induce a haemagglutination inhibition (HI) titre of 1:40, but for novel vaccines that are designed to act through alternative immune mechanisms it will be necessary to demonstrate efficacy of the vaccine in preventing disease caused by influenza viruses, in humans.
Influenza challenge trials provide one means of testing vaccine efficacy, and whilst they are not inexpensive to conduct, are considerably less costly than phase III clinical trials involving many tens of thousands of subjects, and may be conducted more rapidly to give a preliminary assessment of the protective efficacy of a novel vaccine against seasonal influenza viruses. However, conducting efficacy studies in a cohort of humans in a quarantine unit presents challenges in the collection of sufficient objective data points for analysis. Shedding of virus collected in nasal swabs or washes can only be measured once per day, up to 7 days after influenza challenge. Temperature measurements are taken at least twice per day but fevers are rare in healthy young subjects after influenza virus challenge. The majority of the data that is collected to determine the severity of illness consists of self-reported symptom scores. Subjects record whether each of a list of possible influenza symptoms is absent, mild, moderate or severe in a twice daily questionnaire. Symptom scoring may be influenced by many factors such as the mood of the volunteer, the time elapsed since waking, or drinking, or the perception that the volunteer should be feeling better in the later part of the quarantine period.
Earlier time course transcriptomics analysis of data from human influenza challenge trials [
9‐
13] have primarily focused on identifying genes and transcriptional dynamics that are different between clinically symptomatic from asymptomatic individuals at various time points post challenge. The definition for symptomatic and asymptomatic varies between studies but is typically based on total symptom score in the first few days and may include a secondary restriction on virus shedding status.
To our knowledge, no studies have attempted to predict the per individual symptom score using gene expression data using a minimal set of biomarkers. Such a panel would be a more objective measure of symptom scoring in future influenza challenge trials and thus improve comparability between challenge studies, especially those testing different candidate vaccines.
Methods
Influenza challenge study (discovery cohort)
Samples were obtained from a clinical trial (ClinicalTrials.gov Identifier: NCT02014870) conducted to determine the appropriate dose level of live, wild-type A/California/2009 H1N1 virus stock for future influenza challenge studies [
14]. Healthy volunteers aged 18–45 with no detectable HAI titre to the challenge strain underwent intranasal administration of virus whilst housed in a quarantine unit, and were monitored for symptoms of influenza disease and virus shedding. Volunteers in a semi-recumbent position were intranasally challenged with 0.5 mL (0.25 mL in each nostril) of either 1:100 or 1:10 dilution of the neat virus (concentration ~7.0 × 10
7 TCID
50/mL).
Volunteers were quarantined for 9 days after challenge and self-reported twice daily on 16 signs and symptoms of influenza. Symptoms were recorded on a modified Jackson score 0–3: not noticeable, just noticeable, bothersome but can still do daily or bothersome and cannot do daily activity. Nasal swabs were taken daily to determine the live virus shedding load. In consenting subjects, blood samples were collected in PAXgene® tubes before the challenge and 24, 48, 72 and 96 h post challenge for transcriptomics analysis.
RNA extraction and quality control
Whole blood was collected in PAXgene Blood RNA tubes (PreAnalytiX) and processed according to the manufacturer’s protocol. RNA quantity and quality were assessed using a NanoDrop spectrophotometer and Agilent’s 2100 Bioanalyzer.
High throughput qPCR
Real-time quantitative PCR was performed in a Fluidigm system consisting of a BioMark HD instrument, IFC HX controller and 96 × 96 dynamic array, as described in the manufacturer’s user guide PN 68000088 K1 [
15] (“Real-Time PCR Analysis”, appendixes A and D) and document PN 100-2638 D1 [
16] (“Gene Expression with the 96.96 IFC Using Fast TaqMan Assays”). Appendix A in guide PN 68000088 K1 was used for the preparation of cDNA through reverse transcription and appendix D contains the protocol for gene expression analysis using TaqMan assays, including a preamplification step. Raw RNA samples were normalised to 10 ng/μL for the cDNA synthesis and the expression assay involved three technical replicates.
Gene expression microarrays
Blood samples in PAXgene® tubes were thawed over 2 h at room temperature and total intracellular RNA was extracted using the Blood RNA Kit (Qiagen) according to the manufacturer’s instructions. The purity and quantity of the isolated total RNA was assessed using an Agilent Bioanalyser prior to storage at −20 °C until required. Globin mRNA was subsequently depleted using the GLOBINclear Kit (Ambion). Depleted RNA was then amplified and biotin-labelled using the TotalPrep RNA Amplification Kit (Illumina) and RNA quality assessed using Agilent’s 2100 Bioanalyzer. This was purified and assessed using the Agilent bioanalyser. Biotinylated cRNA was hybridised to Illumina Human HT-12 v4 Expression Beadchips according to the manufacturer’s instructions. Beadchips were scanned with an Illumina iScan machine, and data extracted using the Illumina’s GenomeStudio 2011 software.
Microarray analysis
Raw probe level summary were exported from GenomeStudio 2011 and imported into R using the beadarray package [
17]. Probes were background corrected using negative control probes followed by quantile normalization using the neqc command [
18]. The analysis was restricted to probes with a detection p value <0.01 in at least 10% of the samples and probes matching to the transcript definition of the following databases (in descending importance) with at most two mismatches, no insertions and a minimum mapping length of 40 bases: GENCODE version 23, RefSeq (refMrna.fa) and GenBank (mrna.fa) downloaded in August 2015 from
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/. A linear model was fitted using limma [
19] to determine differential expression adjusted for gender, age, challenge dose and batch effects. We used the duplicate correlation option [
20] to account for intra-patient correlations and weighted the arrays by their quality scores [
21]. Nominal p values were corrected for multiple hypothesis testing using the Benjamini–Hochberg procedure [
22].
High-throughput qPCR analysis
We performed TaqMan
® Gene Expression assays to determine gene expression in the whole-blood RNA isolated from selected participants at baseline and the time of maximal symptoms. A panel of 29 primers selected for symptom scoring and three control primers were measured in triplicates on a 96 × 96 Fluidigm plate. One sample and one primer did not meet the quality criteria and were removed from the analysis. The raw Ct values were then imported into R and normalized to the three endogenous control genes (RPL30, GAPDH, and PPIA) into log transformed deltaCt values using the HTqPCR package in R [
23]. The limmaCtData command was used to extract fold changes to DSS.
Regression analysis
For the prediction of DSS on external test sets, microarray intensities and deltaCt values were used to train a random forest model using the R package caret [
24]. The microarray data for training were the residuals after removing the effect of gender, age, challenge dose and scan dates. Three technical replicates were averaged. For genes with multiple probes, we retain only the highest expressing probe. To estimate the training performance, training samples were randomly sampled and out of bag RMSE (root mean squared error) estimates were used to select the optimal tuning parameter mtry. The final hyper parameters chosen for the microarray data were mtry = 2, ntree = 500, sampsize = 105 and for the qRT-PCR panel mtry = 19, ntree = 500, sampsize = 33. Test sample DSS was then predicted using the trained model and prediction performance was evaluated using the percent change in RMSE relative to a best guess model predicting the overall mean. To select a minimal set of predictive features, the VSURF package was used with default settings in R [
25].
External validation dataset
We used two external datasets from Influenza A challenge trials to validate our findings. The first is from Woods et al. [
13], in which 24 volunteers were experimentally infected with H1N1 (A/Brisbane/59/2007) and 17 volunteers with H3N2 (A/Wisconsin/67/2005). Blood was collected approximately every 8 h for transcriptomics analysis but volunteers only self-reported twice daily. We downloaded the RMA processed data from Gene Expression Omnibus website (GSE5428) and only used microarray data (~70%) that was collected within 3 h of symptom reporting. The self-reported scores were to ten signs and symptoms which was a subset used to define DSS in the discovery cohort. The data was generated using Affymetrix Human Genome U133A 2.0 Array. We selected the highest expressing Affymetrix probe for genes with multiple probes and matched to the Illumina probes. The second dataset was used to confirm results obtained with the Fluidigm panel and contains samples from placebo-vaccinated subjects in an independent H1N1 challenge trial (ClinicalTrials.gov Identifier: NCT02071329) conducted by Immune Targeting Systems (ITS) Ltd using the same challenge strain and protocols as the discovery cohort. Blood samples were collected from 30 adult volunteers before challenge and 72 h post challenge in PAXgene
® tubes, and RNA was extracted before profiling gene expression using the Fluidigm panel as described above.
Discussion
This study was designed to develop a precise, objective and cost effective small scale assay to act as a surrogate for symptom scoring in influenza challenge trials. We challenged 21 volunteers with wild-type A/California/2009 H1N1 virus and 15 out of 21 volunteers or 71% reported at least one mild symptom. This is consistent with the previously reported proportion of H1N1 infected volunteers who develop clinical illness after experimental influenza virus infection of 69% [
27]. We generated a whole transcriptome dataset on all 21 volunteers and correlated the longitudinal transcriptomics data with the self-reported symptoms. The most significant differentially expressed genes to DSS were then used to establish a qRT-PCR based Fludigm assay using commercially available primers.
In contrast to previous studies which try to distinguish symptomatic and asymptomatic individuals [
9‐
11], we established a set of biomarkers optimised to detect and differentiate between different levels of symptoms. We showed that the continuous range of predicted symptom scores can be optionally categorized. Importantly, no baseline sample was misclassified when samples were categorized into symptomatic and asymptomatic in our qRT-PCR training and test sets. This suggests that the misclassified samples collected at 72 h post challenge might suffer from the subjective nature of self-symptom reporting rather than inaccurate prediction.
The 19 biomarker panel was identified using whole blood samples from H1N1 infected adult volunteers which raises the question of generalizability to other influenza strains and cohorts. Zaas et al. [
9] developed a 48-gene marker panel to classify H1N1 and H3N2 infected patients as either symptomatic or asymptomatic. Apart from 5 genes (CCL8, CD36, CXCL10, USP18 and SPATS2L), all 14 genes of the 19 gene signature were also found in their panel indicating the potential use of our panel across different strains. Similarly, 12 of our genes also appear in a published signature to detect respiratory infections such as RSV and rhinovirus [
10]. Herberg et al. [
28] compared whole blood samples from 19 children hospitalized due to H1N1 infection (age 1.6–7.5 years) with 33 control children. Interestingly, 12 genes in their top 15 were also present in our panel. These overlaps strongly suggest that our panel can also be used to predict symptom scores across different influenza strains and age groups as well as for binary classification of volunteers into asymptomatic and symptomatic individuals.
There is also evidence that our set of biomarkers could have potential cross species application once the primers have been adopted for species differences. For example, Li et al. [
29] identified ISG15, our most influential predictor, as highly up-regulated in the lung samples from H1N1 infected swine using microarrays and qRT-PCR validation. Another highly influential gene from our panel CXCL10 was shown to be consistently upregulated in H1N1 infected mice, macaques and in swine [
30].
Interestingly, we observed a significant further 18% reduction in RMSE, when we predicted scores based on qRT-PCR trained data compared to microarray data. One reason might be the well-known underestimation of expression changes measured by microarrays compared to qRT-PCR [
31] which we also clearly observed in our screen (Additional file
8: Figure S5) or the higher sensitivity of qRT-PCR. However, we cannot directly compare both results due to the different cohorts which were used for testing.
Although a 34% reduction in prediction error compared to the mean we observed for our qRT-PCR based assay is a good result, we observed four samples with discordant prediction (Fig.
4a). The inaccuracies observed, seem to be more pronounced when recorded DSS were low in the range of one to three. This result is not surprising, since minor symptoms are expected to be more subjective compared to strong or multiple symptoms.
Selecting a very stringent threshold at the microarray level was a design choice deliberately made at a very early stage of the study for two reasons. At first, we aimed at a cost effective and small set of PCR primers. Secondly, we sought to restrict the primers to genes which were highly and significantly correlated to our phenotype after adjusting for differences in variables such as age, gender and batch. Alternatively, the choice of the genes included in the final signature can be left to a feature selection method or a regression algorithm penalizing and removing less informative features. This however potentially leads to removal of biologically relevant genes and to overfitting to the training data at hand. When we applied a feature selection method to our dataset, we found good predictive ability with even two genes (ISG15 and CXCL10). This would enable a very cost effective approach to symptom scoring. However it requires further testing using a larger cohort, if such a small set of predictors can be a robust marker for symptom scoring.
Conclusions
Taken together, we provide here a comparatively small set of genes, which can be used to replace self-reported symptom scores in influenza challenge studies with great accuracy. All primers described here to test expression levels of these genes are commercially available and can be readily used to replace or refine self-symptom reporting in influenza challenge trials. Therefore, these markers can in future challenge studies and possibly refine the panel once tested on larger cohorts.
Authors’ contributions
JF and CB provided the samples from the clinical trials. EP extracted the RNA, with support from AS, and processed them on microarrays and Fludigm assays. MM and MN contributed the transcriptomics data and symptom scoring from a previous published manuscript. JM analysed the data, prepared the tables, figures and wrote the initial manuscript. JM, AR and SCG contributed to discussion and revised the manuscript. All authors contributed to revising the manuscript. All authors read and approved the final manuscript.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.