Background
Colorectal cancer (CRC) is one of the most common cancers, with approximately 1.4 million cases diagnosed and 700.000 deaths reported annually worldwide [
1]. CRC originates from mutations causing abnormal proliferation in the colorectal epithelium and subsequent formation of an adenomatous growth (adenoma) [
2]. Through accumulation of mutations, such adenomas may lead to CRC [
3,
4]. Several risk factors are associated with development of CRC, including diet, smoking and high alcohol consumption [
5‐
9]. Early detection allows efficient treatment of CRC, but only 40% of cases are detected in early stage [
10]. To improve diagnostics, screening systems for CRC have been implemented in many countries, where stool samples are analyzed for the presence of occult blood [
11]. This, unfortunately, leads to a high number of false positive cases resulting in negative psychosocial consequences, increased costs, discomfort and complications related to follow-up diagnostic investigations [
12]. Therefore, more research is needed, in order to find sensitive biomarkers for early non-invasive CRC detection.
A possible role for oncogenic bacteria in CRC was first noted in 1951 [
13] and again in 1974 when it was shown that 64% of patients suffering from
Streptococcus bovis-related endocarditis, also had colonic adenomas or CRC [
14]. It was later revealed that the
Streptococcus bovis subtype,
Streptococcus gallolyticus subsp. gallolyticus (
S. gallolyticus) had a uniquely strong correlation with CRC. Despite clinical associations [
14‐
18], investigations of the prevalence of
S. gallolyticus infection directly in CRC tumors have shown conflicting results [
19,
20]. Recent studies have demonstrated enrichment with the bacteria
Fusobacterium nucleatum (
F. nucleatum) [
20‐
26] and
Bacteroides fragilis (
B. fragilis) [
27‐
30] in tumor tissue and fecal material of CRC patients, while a subsequent investigation indicated that high-level colonization with
F. nucleatum or
B. fragilis were indicators of poor prognosis in CRC patients [
31]. To understand the role of bacteria in colorectal carcinogenesis, we investigated the bacterial involvement in the healthy tissue-adenoma-carcinoma sequence of CRC development. Previous studies investigating precancerous adenomas, have found diverging bacterial compositions. Enrichment of
F. nucleatum has been documented in both fecal samples from patients with adenomas [
32‐
34] and directly in biopsies from the adenomas [
32,
35,
36]. Conversely, Pagnini et al. [
37] found a marked reduction of mucosal adherent bacteria in adenomas, while Shen et al. [
38] did not detect
F. nucleatum in adenomas but only in biopsies from healthy volunteers. A recent study by Rezasoltani et al. [
34] demonstrated enrichment of
F. nucleatum, B. fragilis and
S. bovis in tubular, villous and tubulovillous adenomas but not in hyperplastic or serrated polyps, while in contrast, Yu et al. [
39] found serrated polyps to be more frequently enriched with
F. nucleatum compared to tubular adenomas. While a gradual increase in enrichment of
F. nucleatum from healthy colorectal tissue to adenomas and finally to CRC has been demonstrated [
32,
33,
36,
40,
41], less is known for
B. fragilis [
42] or
S. gallolyticus.
The majority of studies investigating bacterial involvement in the adenoma-carcinoma sequence were based on fecal samples [
32,
33,
40,
41]. Fecal samples are plentiful and are thus often used as a non-invasive method for investigating gut microbiota. Some variations can however, be observed between fecal microbiota and the microbiota of the mucosal lesion [
43]. As a result, more information is needed concerning enrichment of
S. gallolyticus,
F. nucleatum and
B. fragilis in mucosal samples during the colorectal adenoma-carcinoma sequence. Formalin-fixed and paraffin embedded (FFPE) tissue blocks may serve as an abundant source of tissue, enabling studies on bacterial involvement directly in the colorectal tissue. In this study, we compared bacterial colonization of archival colorectal tissue from non-cancerous tissue, adenomas and tumors. Furthermore, we investigated the effects of bacterial status on patient outcome.
Methods
Sample selection
Using the National Pathology Data Bank, we identified all patients diagnosed with colorectal adenocarcinoma, colorectal adenomas, and diverticular disease at the Department of Pathology, North Denmark Regional Hospital in the period 2002–2010. Following surgical removal, tissue samples were stored as FFPE tissue using standard procedures for the Department of Pathology. Number of samples included was based on sample size calculations for two proportions [
44], using a power of 80%, level of confidence of 95% and published prevalences of
S. gallolyticus [
19],
F. nucleatum [
25], and
B. fragilis [
20] in tumor tissue compared to non-neoplastic surrounding tissue. Patients diagnosed with more than one of the investigated lesions were excluded. Samples containing too low DNA concentrations or non-amplifiable DNA were excluded. We collected FFPE tissue from 99 patients diagnosed with colorectal adenocarcinoma (tumors and non-neoplastic paired normal tissue), 96 patients diagnosed with colorectal adenomas, and 104 patients diagnosed with diverticular disease of the colon. An overview of samples can be seen in Additional file
1. Paired normal tissue was only routinely collected from tumors, and thus no paired normal samples were available from diverticula or adenomas. All samples were stored using standard procedures at the Department of Pathology.
Sample preparation and DNA extraction
FFPE samples were collected, with each sample, including tumors and paired normal tissue, occupying separate paraffin blocks. Consecutive tissue sections were prepared from all tissue blocks in the following order: 1 × 4 μm sections for HE (Hematoxylin and Eosin) staining, 4 × 10 μm for DNA purification, and finally 1 × 4 μm sections for comparative HE staining to ensure uniformity and for evaluation by a trained pathologist. This microscope based evaluation revealed neoplastic cells in 23 samples of paired normal tissue and these were therefore excluded, resulting in a total of 99 tumor tissue but only 76 paired normal tissue samples being included. To minimize the risk of cross-contaminations between samples, section knives were changed after each tissue block, and the microtome surface wiped clean with alcohol and RNase Away (Molecular Bioproducts). To monitor potential carry-over of bacterial DNA between samples, an empty paraffin block was included for every 11th patient tissue sample. This paraffin block was freshly prepared but otherwise handled similar to blocks containing tissue.
DNA was isolated from FFPE tissue sections using the AllPrep® DNA/RNA FFPE kit (Qiagen), according to manufacturer’s instruction.
Primer design and qPCR amplification and quantification
Quantitative real time polymerase chain reaction (qPCR) was used to investigate presence and quantity of bacterial species previously associated with CRC in the different histological tissue types. Primers targeting
S. gallolyticus species,
S. gallolyticus subspecies
gallolyticus,
F. nucleatum, and
B. fragilis were designed in-house using Primer3 software, and tested for specificity using primer-BLAST (NCBI) [
45]. The qPCR sought to determine how the relative abundance of
S. gallolyticus,
F. nucleatum and
B. fragilis differed between different histological tissue types, and thus a reference gene was designed targeting the human β-actin gene. Since DNA extracted from FFPE tissue tends to be fragmented [
46], we aimed for amplicon sizes shorter than 200 bp. The sequences, targets, and parameters of the individual primers are summarized in Table
1.
Table 1
Primers used for 16S rRNA gene sequencing and qPCR analysis
Bacteria and Archaea (sequencing) | V4 variable region of the 16S rRNA (515F and 806R [ 51]) | F: 5′-GTGCCAGCMGCCGCGGTAA-3′ R: 5′-GGACTACHVGGGTWTCTAAT-3´ | 65.4 49.0 | ~ 250–390 bpa |
S. gallolyticus spp. | SodA (AP012053) (HE613569) (AP012054) | F: 5´-GCTTGGCTTGTGGTGAATGA-3′ R: 5′-GCGAACGTTGCGATACTTGA-3´ | 59.0 59.3 | 144 |
S. gallolyticus subsp. gallolyticus | SodA (AP012053) | F: 5´- AAGCTGCGACAACTCGCTTT − 3′ R: 5′- AAGCGTGTTCCCAAACGTCA − 3´ | 61.1 60.8 | 150 |
F. nucleatum
| 16 s ribosomal RNA (CP012717) | F: 5´–CCCAAGCAAACGCGATAAGT–3′ R: 5´–GCGTTGCGTCGAATTAAACC–3´ | 59.2 58.9 | 117 |
B. fragilis
| 16 s ribosomal RNA (M11656) | F: 5′- AGTAGAGGTGGGCGGAATTC − 3′ R: 5′- GTGTCAGTTGCAGTCCAGTG − 3´ | 59.2 59.1 | 97 |
β-actin | β-actin (NG_007992) | F: 5´-ACTCGTCATACTCCTGCTTGC-3′ R: 5′-CCTCCTCAGATCATTGCTCCTC-3´ | 60.1 60.0 | 118 |
Bacterial DNA was purchased from DSMZ (Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures), including DNA from S. gallolyticus subspecies gallolyticus (DSM 16831), S. gallolyticus subsp. macedonicus (DSM 15879), S. gallolyticus subspecies pasteurianus (DSM 15351), F. nucleatum (DSM 15643), and B. fragilis (DSM 2151). The bacterial DNA was used for determining limit of detection (LOD) of the individual primers using a dilution series. This was found to be approximately 109 DNA copies for S. gallolyticus spp., 10 DNA copies for S. gallolyticus subsp. gallolyticus, 12 DNA copies for F. nucleatum, and 10 DNA copies for B. fragilis. The bacterial DNA was further used as positive control for qPCR analyses by spiking bacterial DNA into human DNA samples extracted from FFPE colorectal tumors to mimic the sample types used in this study. The ratio of bacterial DNA to total human DNA was 1:40.
qPCR was performed using the Brilliant III Ultra Fast SYBR® Green QPCR Master Mix (Agilent Technologies) according to manufacturer’s recommendations, and analyzed on the Mx3005P qPCR System (Agilent Technologies). All experiments were performed in triplicates using 40 ng of input DNA with the following cycling conditions: 95 °C for 10 min, 40 cycles of 95 °C for 1 min, 55 °C for 30 s and 72 °C for 30 s. In a few cases, several products were apparent on the melting curve analysis, and the PCR was then repeated using a more stringent annealing temperature of 59 °C.
For relative quantification of bacterial DNA in samples the ΔΔCt method [
47] was applied, utilizing the primers summarized in Table
1, with β-actin serving as reference gene.
Five year follow-up
The patient’s histological history was followed over a 5 year period using the National Pathology Data Bank. Time of death or occurrence of new cases of adenomas or cancer in the colorectum were noted for each patient. Survival and disease-free survival were analyzed using the Kaplan-Meier method based on detection of bacteria. Social security numbers were not available for two patients, and their clinical data were therefore not recorded.
16S rRNA gene sequencing
To detect other potential bacterial biomarkers, the composition of bacterial genera were analyzed using 16S ribosomal RNA (rRNA) gene sequencing in a subset of the FFPE samples already investigated in this study. A total of 40 tissue samples were chosen using the Research Randomizer software [
48] to randomly select 10 samples belonging to each separate histological tissue group (Additional file
2). Bacterial 16S rRNA amplicon sequencing targeting the V4 variable region, was performed by DNAsense (Denmark), and followed a modified version of an Illumina protocol [
49]. Briefly, an initial PCR and clean-up was performed as described by Albertsen et al. 2015 [
50] using primers targeting the V4 hypervariable region (Table
1) [
51], and 35 cycles of amplification. Next, indexing primers were attached to all sequences using a second PCR, followed by clean-up [
49]. Finally, all samples were pooled and sequenced using a MiSeq (Illumina, USA) as previously described [
52]. 20% PhiX control library (Illumina) was added to estimate error rate during sequencing, a negative control (nuclease-free water) was added to eliminate background while a positive control (complex sample obtained from an anaerobic digester system) were used to monitor sequencing efficiency and batch effects.
Quality of reads were analyzed using FastQC (Babraham Bioinformatics, UK). Forward reads were trimmed using Trimmomatic v0.32 [
53] to remove poor reads and reads shorter than 250 bp using the settings SLIDINGWINDOW:5:3 and MINLEN:250. The reads were next dereplicated and processed using the UPARSE workflow [
54]. The initial 250 bp of all sequencing reads were clustered using the Usearch v. 7.0.1090 -cluster_otus command with default settings. Operational taxonomic units (OTUs) were formed based on 97% identity and chimeras removed using the Usearch v. 7.0.1090 –usearch_global command with –id 0.97. Finally, taxonomy was assigned using the RDP classifier [
55] as implemented in the parallel_assign_taxonomy_RDP.py script in QIIME [
56] using the MiDAS database v. 1.20 [
57].
Statistics
Data analysis was performed using R version 3.5.2 [
58] through the Rstudio IDE (
http://www.rstudio.com/), and Microsoft Office Excel 2013. For continuous data, distributions were tested using Shapiro-Wilk test. 16S rRNA gene sequencing data was analyzed using the ampvis2 package v.2.3.11 [
59] for Rstudio. α diversity was determined using OTU richness and Shannon diversity index as implemented in the amp_alphadiv command of the ampvis2 packet in R. β diversity was visualized using heat maps depicting the 20 most commonly found OTUs and explored using Principal component analysis (PCA) and redundancy analysis (RDA) clustering of Hellinger Distance transformed OTU abundances. Bacterial genera with statistical significant different distributions amongst differing tissue types, were identified using the DESeq2 package in Rstudio [
60] to generate multiple hypothesis corrected
p-values using the Benjamini-Hochberg procedure [
61]. For a bacterial genus to be considered for further analysis, it needed to be significantly different between tissue groups, and the difference was required to be universal for the majority of samples in the tissue group. That is, for a bacteria to be considered associated with tumor tissue, it should constitute a statistically significant higher proportion of bacteria in the majority of tumor samples.
Categorical data, like presence or absence of bacteria, were analyzed using χ2 test. For continuous data like OTU richness and Shannon diversity index, distribution was tested using Shapiro-Wilks test while variance was tested using Bartlett’s test. Normal distributed data with equal variance were compared using ANOVA followed by Tukeys post-hoc test while non-parametric data were tested using Kruskal-Wallis test followed by Dunn’s post-hoc test. Finally, 5-year follow-up data were analyzed using the Kaplan-Meier method, and a log-rank test were used to compare outcome between patients positive and negative for bacterial infection.
A p value of < 0.05 was considered significant for all statistical tests, with the exception of multiple hypothesis corrected p values where a limit of < 0.01 was utilized.
Discussion
In recent years, there has been a growing number of reports concerning a possible link between different bacterial species and the development of CRC. Several bacteria have been implicated, including
S. gallolyticus [
15,
17‐
19],
F. nucleatum [
20‐
22,
62] and
B. fragilis [
27‐
29,
63]. To investigate changes in the bacterial composition along the colorectal healthy tissue-adenoma-carcinoma sequence, we performed qPCR and 16S rRNA gene sequencing on FFPE tissue from colorectal diverticula, adenomas, tumors and paired normal tissue.
Adenomas harbored a distinct bacterial community compared to non-malignant controls, which has been supported by others [
35,
37,
38]. While the genus
Acinetobacter constitutes a large percentage of total bacteria in both diverticula and adenomas, the relative abundance of
Bacteroides, as well as the percentage of samples positive for the species
F. nucleatum and
B. fragilis, were reduced in adenomas compared to both diverticula and paired normal tissue. The cause for this different microbial composition is unknown, but may result from increased local inflammation during adenoma formation, as previously indicated [
37]. This increased inflammation may result in development of a microbial community with oncogenic potential [
42,
64]. Notably, not all adenomas transition into CRC [
65], and it will therefore be interesting, to establish whether there exists different subtypes of adenomas with various bacterial compositions and potential of carcinogenic progression. During the colorectal adenoma-tumor sequence, we observed a marked increase in the relative abundance of the bacterial genus
Prevotella as well as the species
F. nucleatum and
B. fragilis, all of which have previously been shown to be associated with colorectal tumors [
24,
32,
33,
36,
40‐
42,
66,
67]. These bacteria are known to promote a pro-inflammatory environment [
27,
32,
63,
68,
69], and may thus drive the adenoma-tumor transition by inducing local chronic inflammation. Conversely, we observed that bacteria belonging to the genus
Acinetobacter were absent from all samples originating from patients diagnosed with CRC (both tumors and paired normal tissue), while being highly abundant in both diverticula and adenomas. Similar observations have been made in rectal cancer [
70], and further suggests that a distinct bacterial niche develops during the adenoma-tumor transition. In contrast to previous studies [
41,
42], we did not observe a difference in the percentage of early and late stage CRC tumor samples positive for
F. nucleatum or
B. fragilis, indicating that these bacteria do not drive tumor progression. Finally, to elucidate the role of
F. nucleatum and
B. fragilis in initiation and progression of CRC, we investigated the 5 year risk of new cases of adenomas, CRC or death depending on bacterial status. In our study neither
F. nucleatum nor
B. fragilis affected the risk of death or the risk of developing new adenomas or CRC in either CRC, adenoma or diverticular disease patients. Overall our results suggest that the bacterial genus
Prevotella and the species
F. nucleatum and
B. fragilis may play a role in the transition of adenomas to CRC, but not in initiation of adenomas nor in the progression from early to late stage colorectal tumors.
Two surprising observations were noted during this study. First, despite the noted association with CRC [
14,
19,
34,
71], we did not detect
S. gallolyticus in any of the investigated tissue samples. The conflicting results could potentially be explained through ethnic differences in susceptibility to
S. gallolyticus colonization of colorectal mucosa or geographical differences in
S. gallolyticus distribution. This is supported by similar findings by Viljoen et al. [
20] in a South African CRC population. Secondly, while several studies [
21,
22], including the current study, utilize paired normal tissue obtained from CRC patients as a matched “healthy” control, we observed that the bacterial composition of tumor tissue and paired normal tissue overlapped considerably. While more samples are needed to validate this observation, it does question the validity of using paired normal tissue as healthy controls when investigating bacteria of CRC.
This study has a number of limitations. First, all samples used were fixed with formalin. Since formalin is known to affect DNA quality [
72], this may have limited our ability to detect bacteria. Since all tissue samples were handled similarly, we do not expect the formalin fixation to affect the observed differences in bacterial load and prevalence between diagnoses. A second limitation involves the previously reported difficulties in extracting DNA from gram-positive bacteria like
S. gallolyticus [
50]. The sequencing data revealed a high proportion of gram positive bacteria including other members of the
Streptococcus genus. Thus, this limitation does not explain the lack of
S. gallolyticus reported in this study. Finally, while the primers used in this study have low LODs compared to bacterial DNA, the LODs were established on purified DNA from bacteria, which would have a higher quality compared to FFPE bacterial DNA stored for up to 10 years. The true LOD of the primers in the examined tissue samples, could therefore be higher, as reported by Viljoen et al. [
20]. This could prevent detection of low abundance bacteria, causing us to underestimate the bacterial colonization across all samples. This study had a specific focus on the bacterial species
F. nucleatum,
B. fragilis and
S. gallolyticus. However, other studies have revealed other bacteria with an unique correlation with CRC, including
Escherichia coli [
63]. Future studies would need to include this bacteria as well.
Strengths of this study include the large number of samples included, the inclusion of precursor lesions and non-malignant tissue in addition to tumor and paired normal tissue as well as a follow-up investigation investigating the clinical relevance of the bacteria in addition to the bacterial status.
Acknowledgements
We would like thank laboratory technicians Ann-Maria Jensen, Bente Wormstrup, Mette Skov Mikkelsen and Katrine Bech Hjort Lauritzen for excellent technical assistance. Furthermore, we would like to acknowledge the biotechnology students Celine Boure, Anna Hustedova, Monika Jonikaite, Zivile Kondrotaite and Patricia Riedlova for running the 16S rRNA gene sequencing.