Background
Smoking is the leading cause of chronic obstructive pulmonary disease (COPD) and the third highest cause of death globally [
1,
2]. Despite the clear associated risk, only a fraction of smokers eventually develop COPD [
2,
3]. What causes some smokers, and not others, to develop COPD remains unknown and an area of active research [
2‐
5]. Recent work examining the lung bacteriome of individuals with moderate to severe COPD revealed decreased bacterial diversity compared to nonsmokers [
6‐
11]. As a result, it has been proposed that changes in lung-resident bacterial communities may lead to COPD [
4‐
8]. However, respiratory tract bacterial communities of individuals with mild COPD, “healthy” smokers, and nonsmokers are not significantly different [
8,
11‐
13], suggesting that factors other than commensal bacteria may trigger COPD development.
To date, few studies have examined lung viral communities where the vast majority of viruses have been identified as bacteriophages [
14‐
18]. Phages impact bacterial communities through direct and indirect interactions. Though phage ecological roles are unknown in the lung, their activities are relatively well-documented in the oceans where they regulate bacterial population sizes, diversity, metabolic outputs, and gene flow [
19‐
24]. In humans, phages may stimulate the immune system leading to immune-mediated microbial competition [
25], tax the immune system enabling opportunistic infection [
26], or work symbiotically at human mucosal surfaces providing a source of additional immunity [
27]. Thus, changing lung viral communities could alter the bacteriome leading to dysbiosis and disease progression in pre-affected (e.g., COPD) individuals [
6‐
8]. Here we utilized a historical cohort to explore the impact of smoking on the lung microenvironment with specific focus on the role of double-stranded DNA (dsDNA) viruses. To do this, we applied a quantitative sample-to-sequence dsDNA viral metagenomic processing pipeline [
28] that maintains relative abundances between samples and used these data as a baseline to compare and ecologically contextualize lung viromes in relation to lung bacteriomes, metabolomes, and immunologic profiles of “healthy” smokers and nonsmokers.
Discussion
In this first study of the effects of smoking on the lung DNA virome, we found that, in contrast to the lung bacteriome, smoking was associated with significant changes in the lung virome and metabolome. Overall, smokers exhibited a contraction of the lung virome, evidenced by lower numbers of viral populations and altered viral ecology. Virome differences between smokers and nonsmokers remained significant even after accounting for age difference between the groups. We hypothesize this altered viral ecology may drive changes in the BAL metabolome between smokers and nonsmokers. Alternatively, changes in the lung metabolic profiles of smokers may lead to downstream effects on the virome, though we consider this less likely as early metabolic changes would presumably also impact bacterial ecology, a link we failed to identify in this study.
Key to our analyses was the ability to quantitatively identify and enumerate viral populations in the lung. While sequence-based 16S rRNA amplification has enabled the rapid quantitative characterization of bacterial communities within the lung [
51], the identification and enumeration of respiratory viruses has been much slower due to the lack of a single universal viral marker gene and the difficulty in obtaining sufficient viral biomass from airway samples to sequence without amplification. As a result, all lung virome studies to date have used multiple displacement amplification (MDA) to increase viral DNA yield [
14‐
17]. While this amplification step is useful for amplifying single-stranded DNA viruses, it has both systematic and stochastic biases and results in a non-quantitative representation of community members that varies as much as 10,000-fold from the original [
52].
Environmental samples often have low biomass and, as a result, low input DNA, especially in aquatic environments. As a result, most research on producing quantitative viral metagenomes has been done with marine samples, which has shown that samples with as low as 100 femtograms of starting DNA are quantitative if MDA is not used [
28,
53‐
55]. Our lung metagenomes were produced using the DNA-to-sequence pipeline used to produce quantitative marine viromes.
It is important to note that in other systems, reduced microbial diversity is associated with dysbiosis [
56]. In the lungs of smokers, such dysbiosis might lead to COPD progression. Previous studies demonstrated differences in the bacteriome of patients with advanced COPD compared to healthy controls [
7,
13], however no differences were observed between healthy smokers and nonsmokers [
12] suggesting that bacterial dysbiosis may not be responsible for COPD disease progression. In contrast, we found that viral diversity was significantly lower in the lungs of healthy smokers, and this viral dysbiosis was associated almost exclusively with changes in phage ecology. We propose that smoking leads to early effects on the lung virome, and specifically the phageome, which may influence and drive later changes in the bacteriome during progression to COPD. It remains to be determined whether microbial changes lead to disease progression or whether disease progression provides the niche for alterations in the lung microbiome. Well-controlled, longitudinal studies are needed to address this important question.
In the gut, alterations in the number and composition of Proteobacteria is hypothesized to be a signature of dysbiosis and disease [
50]. Our corollary finding of associations between two Proteobacterial phages and metabolic changes in smokers parallels these gut findings. Given that Proteobacteria changes were not associated with metabolic differences, we hypothesize that increased numbers of Proteobacteria phages may alter metabolic output within their bacterial hosts during infection.
Previously, we described the presence of bacterial pneumotypes in the lungs of healthy volunteers, thought to be related to the degree of silent aspiration of supraglottic taxa. Using these same specimens, we failed to identify unique viral pneumotypes. Nonetheless, the presence of rare viruses such as
Spiroplasma phage and human herpesvirus 8, appear to enable colonization by new, closely related common virus types and, thus, may be important for establishing viral pneumotypes (Additional file
4: Figure S3) as has been proposed for bacteria [
57,
58]. Analyses of more lung viromes are necessary, however, to clarify the existence of, or lack thereof, viral pneumotypes.
Consistent with prior studies [
14,
16‐
18], the vast majority of viruses identified in our lower airway samples were phages. Nonsmoker viromes were enriched with
Lactobacillus and
Gardnerella phages while smoker viromes were enriched with
Prevotella phages. Prior in vitro work has suggested that a byproduct of cigarette smoke induces
Lactobacillus phages [
59]. However, there are about 4000 compounds in cigarette smoke [
60], some of which may induce phage while others may suppress phage, though research in this area is lacking. In our study, the majority of smokers were former smokers and therefore, not recently exposed to cigarette smoke. Additionally, we observed an increased relative abundance of
Lactobacillus phages in the context of the entire DNA virome of nonsmokers. It is possible that bacteria, phages, or host factors may influence phage induction in the lung microenvironment, as previously demonstrated in co-culture studies of lysogenic bacteria and human epithelial cells [
61], factors difficult to model with an ex vivo experiment.
Interestingly, we did not observe crAssphage, a virus found ubiquitously in the human gut and vagina and on the skin [
62], in our airway samples, nor did we identify single-stranded DNA anelloviruses. In fact, in our cohort of healthy smokers and nonsmokers, we identified very few eukaryotic DNA viruses in total. The absence of crAssphage may be niche-specific, as it also was not identified in other lung virome studies [
14‐
16]. The absence of anelloviruses in our study may be related to the healthy status of our subjects or to differences in sample preparation and sequence analysis compared to other studies. Anelloviruses have primarily been identified in immunocompromised subjects (lung transplant, HIV or deceased organ donors) using MDA-amplified viromes [
14,
17].
We did, however, identify high abundances of
Propionibacterium phage across all 30 lung BAL samples. Notably,
Propionibacterium spp. bacteria were previously noted in these samples when 16S rRNA gene sequencing was performed with 454 sequencing of the V1-V2 region [
29], but not with Illumina MiSeq sequencing of the V4 region [
30], indicating that bacteriome comparisons between studies sequencing different regions of the 16S rRNA gene should be made with caution. While the V4 region is excellent at amplifying bacterial and archaeal 16S rRNA genes [
32,
33], it has been shown to be less specific for
Propionibacterium spp. [
63]. Our virome data is consistent with the 454 sequencing of V1-V2 [
29] which linked
Propionibacterium spp. to the “background predominant taxa” bacterial pneumotype as suggested by other studies [
49]. Due to the low biomass nature of the lower airways and factors associated with BAL collection, the presence of background taxa in these types of samples is inevitable. However,
Propionibacterium spp. bacteria have been identified in diseased lungs of subjects with bronchiectasis [
64] and sarcoidosis [
65] as well as in metagenomic studies of lung tissue and extracellular vesicles [
9,
66,
67]. In healthy lungs, the data on
Propionibacterium spp. bacteria in BAL is conflicting [
12,
29,
30,
68]. If
Propionibacterium phage, like
Propionibacterium spp. bacteria, represent background, it is important to note that these sequences were found in all samples and were not associated with separation of the virome between smokers and nonsmokers.
We note that changes in phageome composition were not reflected in bacteriome changes. There are several potential explanations for this phenomenon. First, it is impossible to know if the viral nucleic acid and bacterial 16S rRNA genes being sequenced represent live or dead microorganisms. Second, viral reference databases, in general, lack robustness, increasing the challenge of properly aligning and assigning taxonomy to short stretches of viral nucleic acid. To improve the likelihood of identifying viral taxa, we combined multiple viral reference databases into a single, custom database. However, the compositional nature of the relative abundance data will be highly impacted by gaps in the reference database used for annotation. Third, phage-bacteria networks are unique to individuals, vary across body sites and are impacted by environmental factors as recently shown in a network-based analytical model by Hannigan et al. [
69]. Therefore, it will be important to continue to consider not only the composition of the microbiome (bacteriome, virome, mycobiome), but also the dynamic interactions between those constituents and with the surrounding environment in future studies.
It is still unclear why some smokers progress to COPD while others remain unaffected, though there is evidence that byproducts of lipoxygenation of arachidonic acid, leukotrienes and lipoxins are important for COPD pathogenesis [
70]. Recent studies have also implicated IL-8 as an important potential marker of COPD pathogenesis [
71,
72]. Interestingly, of all metabolites and cytokines studied, we observed the strongest association between arachidonic acid and IL-8 and changes in the smoker lung virome. Thus, monitoring specific phage groups or the whole viral community could be important for predicting trends in arachidonic acid and IL-8 and the progression of the smoker lung to COPD. Whether this is a direct interaction or not remains to be determined, but these observations provide a novel pathway of exploration for future studies.
There are several limitations to our study. Statistical power was low in our analyses due to a relatively small sample size. However, due to the invasiveness of the lower airway sampling and cost restraints of our multi-omic approach, particularly in regards to high-throughput next generation sequencing of the virome, we were limited to a cohort of 30 subjects. Nonetheless, our cohort size is in line with current gut virome studies, which do not require an invasive procedure for sample collection. In total, there are 20 gut virome studies with unique datasets [
40,
73‐
91]. Of these studies, the mean number of participants is 35 and the median 20. While smaller than recent lung bacteriome studies, this is the largest study to date to analyze the combined DNA virome, bacteriome and metabolome of BAL fluid. A larger cohort would allow for investigation of the potential role of other important covariates, such as gender, ethnicity, and age, on the lower airway virome. Our study was a cross-sectional analysis of the lower airway microenvironment in smokers and nonsmokers and does not allow for the analysis of trends over time nor the characterization of microbiome changes in relation to COPD progression. Indeed, the lower FEV
1/FVC ratio observed among smokers may be related to early inflammatory airway dysfunction present at a stage where smokers do not meet COPD criteria [
72,
92,
93]. Future longitudinal studies are greatly needed to evaluate whether changes in the lower airway virome have an impact on chronic inflammatory airway dysfunction among smokers. We were also limited by availability of historical specimens as we did not have access to matched oral rinse or pre-bronchoscopy saline control samples of sufficient quantity for shotgun sequencing, thereby precluding characterization of the supraglottic or saline virome. Finally, due to technical constraints, we assessed the acellular BAL DNA virome. Shotgun metagenomics sequences all nucleic acid in a sample, and despite the use of acellular BAL to reduce human genomic contamination, the virome sequence space made up only a tiny fraction of all sequences. Further, in low biomass samples, even small increases in host genomic material will quickly swamp low viral signal. Technical advances in BAL virome purification or enrichment, removal of contaminating host and bacterial nucleic acid, and deeper, more affordable sequencing technologies should be a focus moving forward, thereby allowing more detailed analysis of the lung virome.