Introduction

Microarray-based gene expression profiling is a frequently used and powerful tool in the search for molecular “fingerprints” of specific diseases and thus the development of clinically relevant biomarkers for those diseases. For example, seminal investigations have considered the use of large-scale gene expression analyses to classify disease states (Alizadeh et al. 2000; Dyrskjot et al. 2003; Golub et al. 1999), develop diagnostic and prognostic gene profiles (Mirnics et al. 2000; van‘t Veer et al. 2002; Welsh et al. 2001), and characterize transcriptomic fingerprints of drug efficacy and toxicity (Gunther et al. 2003). While many of these landmark studies relied heavily on primary tissue samples, recent investigations have focused on peripheral blood, a more accessible tissue (Chao et al. 2008; Coppola et al. 2008; Glatt et al. 2005; Le-Niculescu et al. 2009; Miller et al. 2007; Wang et al. 2005). Aside from being relatively easy to obtain, many of the physiological characteristics of blood cells suggest that peripheral blood gene expression is a reasonable surrogate for specific primary tissue gene expression and hence can be used in the development of clinically meaningful expression-based biomarkers for diseases whose molecular “lesions” are associated with particular nonblood primary tissues (Fan and Hegde 2005; Liew et al. 2006; Mohr and Liew 2007).

One limitation of current studies designed to correlate blood gene expression patterns to primary tissue gene expression patterns is that blood and tissue samples are often not collected from the same set of individuals (Glatt et al. 2005; Liew et al. 2006; Solmi et al. 2006; Sullivan et al. 2006). The use of independent sources of blood and nonblood tissue gene expression information limits interpretability and generalizability of relevant studies and calls into question any putative blood-based gene expression biomarker panel for the diagnosis and treatment of a disease whose primary lesions are not in blood. Nonetheless, it is often too difficult or even impossible to obtain blood and primary tissue samples from the same living individuals. The mouse provides one possible solution to this dilemma.

Unlike human subjects, blood and primary tissue samples can be easily collected from the same living mouse under highly controlled conditions. Intra- and interindividual variation introduced by gender, age, time of day, genetic variation, and environment can be reduced in such studies (Cobb et al. 2005; Leonardson et al. 2010; Radich et al. 2004; Whitney et al. 2003), while clinically acceptable and highly standardized protocols for blood collection, RNA isolation, and globin reduction can be employed to lessen technical, assay-induced variation (Cobb et al. 2005; Debey et al. 2004, 2006). In addition, the fact that many different isogenic strains of mice exist suggests that given the clone-like nature of the mice within such strains, it is possible to sample expression patterns in different tissues from different individuals within particular strains and test the expression patterns for consistencies as though they were obtained from the same individuals.

It is known that high levels of globin transcripts in the blood can confound the accurate assessment of the expression levels of genes in the blood (Wu et al. 2007), as globin mRNA represents up to 70% of the total expressed transcripts and consequently limits the ability to accurately detect genes expressed at low levels in the blood. Thus, globin reduction is often considered a necessary step in the evaluation of whole-blood gene expression profiles via microarrays. GLOBINclear™, a commercially available (Applied Biosystems/Ambion, Austin, TX) globin reduction protocol, has been shown to improve gene expression detection sensitivity, remove up to 95% of α- and β-globin mRNA, and diminish globin-specific expression patterns in human whole-blood samples (Field et al. 2007; Liu et al. 2006; Whitley et al. 2005; Wright et al. 2008). GLOBINclear™ is also advantageous for mouse studies in that it has been specifically developed for the mouse (Whitley et al. 2007).

In this article we describe a study designed to (1) evaluate and characterize the effects of globin reduction on whole-blood gene expression in different mouse strains, (2) determine which pathways are enriched for genes that appear to be heavily influenced by the confounding or masking effects of globin in the blood, and (3) assess the utility of globin-reduced mouse whole blood in the identification of potential biomarkers of human disease.

Materials and methods

Sample collection

All animal procedures were performed according to protocols approved by the University of California San Diego Institutional Animal Care and Use Committee. Seven-week-old male mice were purchased from The Jackson Laboratory (Bar Harbor, ME) (129S1/SvImJ, A/J, C57BL/6 J, C3H/HeJ, DBA/2 J, and FVB/NJ) and individually housed for 1 week prior to blood collection. All mice were anesthetized using isoflurane in a fume hood and whole blood was collected via cardiac puncture. The blood was transferred to an EDTA tube and then TRIzol® LS reagent (Invitrogen, Carlsbad, CA) was immediately added (3:1 TRIzol:blood) creating a solution in a 15-ml tube that was stored at -80°C for no more than 2 weeks.

Processing and globin reduction

The extraction of total RNA from the blood was performed using the TRIzol reagent (Invitrogen) according to the manufacturer’s instructions. Total RNA quantity and quality was assessed by spectrophotometer and the RNA 6000 Nano Chip kit (Agilent, Santa Clara, CA). α- and β-globin mRNA were reduced from a portion of the total RNA samples using the GLOBINclear™ Mouse/Rat kit (Ambion) according to the manufacturer’s instructions, with the recommended start quantity of 10 μg of total RNA.

Sample amplification and microarray analysis

Gene expression analysis was performed on all whole-blood RNA and globin-depleted samples using Mouse 430 2.0 arrays (Affymetrix, Santa Clara, CA) containing 45,101 probe sets. Sample labeling, hybridization, and scanning were performed as previously described (Zapala et al. 2005). Three biological replicate samples from independent mice were prepared for each strain for a total of 18 mice. All raw data is available on the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo, GSE19282).

Data analysis

Data processing and analysis were performed using R (http://www.R-project.org) and Bioconductor (http://www.bioconductor.org) (Gentleman et al. 2004). Array images were visually scanned for artifacts while quality control reports (affyQCReport) (Parman and Halling 2008) were assessed and determined to be acceptable under Affymetrix guidelines (Affymetrix GeneChip Expression Analysis Data Analysis Fundamentals). MAS5 detection calls were obtained using the Bioconductor affy package (Gautier et al. 2004). MAS5 detection calls are used to determine whether a particular probe set is detected above background. Only probe sets called present (detection P < 0.05) were utilized for analysis. The associated false discovery rate (Benjamini and Hochberg 1995) for a detection P value of 0.05 was 28.9 ± 11.3% over all arrays. To accommodate for false discoveries, probe sets were filtered for those present across all whole-blood or globin-reduced samples prior to gene set enrichment analysis (McClintick and Edenberg 2006).

Gene set enrichment analysis

Probe sets that were present in all 18 samples were assigned to two categories (whole-blood RNA or globin-reduced RNA) (Supplementary Tables 1 and 2) and imported into MetaCore (http://www.genego.com) for enrichment analysis in GeneGo Pathway Maps, GeneGo Diseases by Biomarkers (GeneGo, St. Joseph, MI), GO Processes, and GO Molecular Functions. GeneGo Pathway Maps represent a set of genes participating in a consecutive set of metabolic signals, or metabolic transformations, confirmed as a whole by experimental data or by inferred relationships. GeneGo Diseases by Biomarkers are groups of genes implicated in certain diseases based on classifications in Medical Subject Headings (http://www.nlm.nih.gov/mesh/). The list of genes represented on the Affymetrix Mouse 430 2.0 array was used as a base gene list when calculating P values in the MetaCore enrichment procedures. MetaCore (GeneGo) uses a hypergeometric model to determine the significance of enrichment (Falcon and Gentleman 2007).

Comparison to human whole-blood gene expression and tissue gene expression

For mouse and human whole-blood gene expression comparisons, raw human whole-blood gene expression data were downloaded from the Gene Expression Omnibus (GSE2888, GSE 16728). MAS5 calls were obtained as outlined above and filtered for orthologous probe sets present in 80% or more of globin-reduced samples for each study individually. Orthologous genes and their associated Affymetrix probe sets were identified using Ensembl Biomart (http://www.www.ensembl.org/biomart). Filtered probe set lists (Supplementary Table 3) were imported into MetaCore and tested for enrichment in GeneGo Pathway Maps. The list of genes represented on the Affymetrix HG U133A array was used as a base gene list when calculating P values in the MetaCore enrichment procedures for GSE2888 and GSE16728.

Results

Probe detection and microarray sensitivity following globin reduction

Consistent with previously published study results, reduction of globin mRNA in mouse whole blood resulted in a consistent increase in the number of probe sets detected and improved microarray sensitivity, particularly for low-abundance genes. The average number of present calls in globin-reduced samples was 12,411 ± 1904 compared to 5840 ± 944 in untreated samples (Fig. 1), while 5383 probe sets were present across all samples following globin reduction in contrast to 1791 present probe sets in whole-blood RNA. Of the probe sets present across all arrays (n = 5400), 3609 probe sets were unique to globin-reduced RNA, 17 to whole-blood RNA, and 1774 were common to both whole-blood and globin-reduced RNA. Greater than one third of all probe sets present only in globin-reduced RNA were ranked among the bottom 25% of all detectable probe sets, while less than 4% were among the top 25% (Table 1). In contrast, two thirds of the probe sets detected in both whole-blood and globin-reduced blood RNA were among the highest 25%. Again, this suggests that globin reduction has a greater influence on the ability to detect genes expressed at low levels in whole blood and supports the idea that high levels of globin mRNA decrease detection sensitivity.

Fig. 1
figure 1

Boxplots of present calls in whole-blood RNA and globin-reduced blood RNA samples. The boxes represent the lower quartile through the upper quartile, while the whiskers extend to 1.5 times the interquartile range. Open circles denote outliers. A bold line denotes the median. WB whole-blood RNA, GR globin-reduced blood RNA

Table 1 Distribution of probe sets detected in (1) globin-reduced samples only, (2) whole-blood samples only, or (3) both whole-blood and globin-reduced samples across all detectable probes (n = 5400)

Gene set enrichment analysis of consistently present probe sets in whole and globin-reduced RNA

In order to evaluate the potential of expression profiles generated from globin-reduced mouse blood compared to whole mouse blood, probe sets detected as present across all samples in whole-blood RNA (n = 1791; Supplementary Table 1) and globin-reduced RNA (n = 5383; Supplementary Table 2) were imported into MetaCore for gene set enrichment analysis. Globin reduction increased the ability to detect genes in peripheral whole blood involved in a variety of different biological pathways, most notably development, g-protein signaling, and immune response (Fig. 2). In total, the number of significantly enriched (P < 0.001) GeneGo Pathway Maps increased from 43 in whole-blood samples to 107 in globin-reduced samples. Globin-reduced samples were similarly enriched for GO biological processes and molecular functions (Supplementary Figs. 1 and 2), including processes and molecular functions previously described as enriched in globin-reduced human blood. This indicates that the globin expression signal significantly weakens the ability to detect the expression levels of many genes interrogated on microarrays, not necessarily due to some biological connection with globin’s involvement in particular biological processes, but rather by virtue of their globin-relative signal strength detectable via chip-based multiprobe hybridization. The genes that happen to be affected by this phenomenon collectively participate in a number of biologically meaningful functions and processes. Thus, studies that do not reduce globin yet investigate mouse blood gene expression as a way of understanding disease processes are likely to fail to implicate many important genes since their expression levels are masked by globin.

Fig. 2
figure 2

Number of significantly enriched GeneGO pathways maps at a threshold of P ≤ 0.001 in mouse whole-blood RNA (gray) and globin-reduced RNA (black). The associated false discovery rate is less than 0.01 for a P value threshold less than or equal to 0.001

The ability to detect genes previously implicated in disease was also significantly improved following globin reduction. In general, both whole-blood and globin-reduced blood RNA were significantly enriched for genes associated with diseases involving all cell types found in peripheral blood, including thrombocytes/platelets (thrombocytopenia), erythrocytes (anemia), myelocytes (myeloid leukemia), and lymphocytes (lymphoma) (Table 2, Supplementary Tables 4 and 5). Most importantly, the average potential to identify expression levels of genes involved in a given disease or biological network more than doubled following globin reduction as seen by the proportion of disease biomarkers or disease-associated genes identified as present. To test whether the removal of globin transcripts has a statistically significant effect on the proportion of genes identified, significantly enriched GeneGo Diseases (P ≤ 0.05) in either whole-blood RNA or globin-reduced RNA were assessed using the McNemar test. For the majority of significantly enriched disease networks, the reduction of globin transcripts significantly improved the ability to identify biomarkers of disease (data not shown), including a variety of nonhematologic-based diseases such as neuromuscular diseases, neurodegenerative diseases, and chromosome aberrations (Table 3).

Table 2 Top 25 statistically significant GeneGO disease categories in globin-reduced mouse blood RNA as compared to whole-blood RNA
Table 3 Nonhematologic-based diseases with a significantly improved proportion of gene expression levels detected in globin-reduced mouse blood RNA as compared to whole mouse blood RNA

Comparisons to human whole blood

In order to evaluate the relevance of mouse blood gene expression to human blood gene expression profiles, the results of our mouse expression study were compared to two human studies by gene set enrichment analysis (Lu et al. 2009) (GSE2888; GSE16728) (Table 4). Of the 325 statistically significantly enriched GeneGo Pathway Maps (P < 0.001), 233 were significantly enriched in our mouse study and the two human studies (Fig. 3) and include 97 of the 107 pathways found to be significant after globin reduction of mouse whole-blood RNA, including the development and immune response pathways. Although these results are not completely definitive due to differences in protocols used across each study, including different globin reduction methods, the number of samples, and microarray designs, the overlap in significantly enriched pathway categories suggests that genes expressed in mouse peripheral blood reflect those expressed in human blood.

Table 4 Data set characteristics for human-mouse gene set enrichment analysis
Fig. 3
figure 3

Venn diagram comparing significant GeneGO pathway maps (P ≤ 0.001) enriched in globin-reduced mouse RNA and globin-reduced human (GSE2888 and GSE16728) RNA. The associated false discovery rate is approximately 0.01 for a P value threshold less than or equal to 0.001

Discussion

Our analysis of the effects of globin reduction on mouse whole-blood-derived total RNA confirms the previously observed increase in expression detection sensitivity and overall detection rate in both humans and mice (Field et al. 2007; Whitley et al. 2005, 2007) and further emphasizes the importance of globin reduction in evaluating biologically significant pathways and disease processes in mouse models. Gene set enrichment analysis also indicates globin-reduced mouse blood RNA is a reasonable and practical model for the study of blood-based gene expression correlates of human disease, as biological pathways significantly enriched in globin-reduced mouse and human blood overlap considerably.

However, globin reduction may not always be beneficial or necessary in certain disease contexts. Our results suggest that globin reduction appears to have little effect on the ability to detect certain classes of biological pathways (Fig. 2), while a small number of genes significantly decrease in expression or fall below the limits of detection following globin reduction, as noted in previous studies (Field et al. 2007). A BLAST search did not identify significant homology between the 17 probe sets whose expression level-based presence was unique to whole-blood RNA samples and the globin gene family, suggesting that these probe sets are not specifically removed during globin reduction but rather decrease in expression due to a slight decline in RNA quality (Vartanian et al. 2009) or other nonspecific effects. Nonetheless, these findings indicate that globin-reduced, peripheral blood-based gene expression profiling of relevant mouse models may reveal unique patterns of gene expression relevant to human disease and aid in the discovery of clinically significant biomarkers.