Background
In the coming decades aging populations will cause an increased number of people spending more lifetime with disabling neurodegenerative diseases like dementia and Parkinson’s disease [
1]. Therefore, medical treatments and appropriate diagnostic tools are urgently needed to maintain health-related quality of life of elderly people and also to minimize the ethical and financial burden for societies. Therefore, in recent years many efforts were taken to improve early diagnosis of initial pathological changes in neurodegenerative diseases, as this is a precondition to interfere with disease progression before obvious and often irreversible clinical symptoms appear.
Cerebrospinal fluid (CSF) from lumbar punctures is frequently used for biomolecule-based diagnostics of neurological diseases, and combinations of marker molecules were proven useful for the diagnosis of neurodegenerative, inflammatory and infectious diseases of the central nervous system (CNS). Nevertheless, none of the biomarkers currently in use is exclusively specific for only one disease condition, and the diagnostic value of proteins, DNA and other marker molecules still depend on additional diagnostic findings and the knowledge of the clinical context [
2‐
6]. This deficiency, and the above mentioned pressing need to face the expected increase in neurodegenerative diseases, urge for the discovery and validation of reliable, specific, and prognostic marker molecules.
In the last decade, the detection and characterization of RNA species in different body fluids reflecting the transcriptome of their sources of origin, has fuelled hope for the development of new, specific prognostic and diagnostic RNA-markers [
7,
8]. But very recent work summarizing the state of the art of RNA-based diagnostics clearly points out that current achievements in this field cannot yet live up to the initial expectations [
9‐
13]. A lack of standardized workflows from sample generation to RNA-extraction and finally to RNA-measurement is the main reason for this deficiency. Comparisons between studies are still hampered by differences in sample collection [
14], sample processing [
7,
9,
14‐
17], technical variability in RNA-profiling platforms [
9,
18‐
22] and RNA analysing algorithms [
23], variability between technical replicates [
24], studies with small sample sizes that disregard rare RNAs with low detection limits [
7,
16,
18,
25,
26], biased results through sample contamination with blood-derived cells [
14,
20,
27,
28], and studies with small numbers of cases and thus low statistical power [
18,
26].
In addition to the described hindrances, more general challenges connected to human samples like CSF need to be considered, i.e., the total volume of CSF in adults is approximately 150 ml, and has an average, age-dependent turnover rate of approximately four times a day that depends on the physical activity [
29]. This, and gender- and age-dependent differences, seen in RNA-analysis of CSF [
14,
30] partially explain the observed donor-to-donor variations, and similar considerations come true for blood samples as well [
24,
31,
32]. Further confounding factors are the genetic heterogeneity, medication, a high variability in RNA-turnover [
27], and RNA-concentrations of CSF which usually are below the limit of detection of most methods [
19,
22,
26,
27]. Furthermore, certain RNA species can selectively be packed in extracellular vesicles (EV) [
25,
30,
33‐
35], whereas others seem mainly bound to extravesicular proteins [
24,
25,
32,
36‐
39]. The latter raises the question whether RNA extracted from CSF-fractions, or total RNA from whole CSF is best suited for disease prognosis and diagnosis, or whether profiles of both fractions are necessary to comprehend all RNA-associated characteristics of a disease. In order to address this crucial question, we prepared and analyzed RNA from EV and total RNA from large volumes of pooled human CSF samples and analyzed both with next-generation-sequencing (NGS).
Here we show that body fluids and their respective EV have significantly different compositions of long and small RNA, and that miR derived from whole body fluid and respective EV have the potential to affect different cellular and biological processes.
Materials and methods
Collection of human CSF
CSF samples were collected according to clinical necessities for routine diagnostics in the Department of Neurology, University Medical Center of Göttingen. Only samples from patients who presented in the clinic with a variety of symptoms but finally had no obvious signs of a known disease were included in this study. Before onsets of clinical routine diagnostics, the number of erythrocytes and leucocytes were counted manually in each sample, and within one hour after aspiration the samples were centrifuged for ten minutes at 105 × g. Cell-free supernatants were carefully aspirated for further clinical analysis. Samples with signs of haemolysis before the centrifugation step or samples that had counts above 275 erythrocytes/µl before and/or signs of haemolysis after the centrifugation step were excluded from this part of the study. Furthermore, only samples with leucocyte counts below 8 cells/µl were included in this part of the study. Five samples with leucocyte counts from five to eight leucocytes/µl before centrifugation were also made cell-free by centrifugation and were also included in this part of the study, as their donors had no obvious signs of a known disease. After completion of clinical analysis, the remnants of the CSF samples were stored frozen at − 80 °C until further processing for this study. No CSF samples were specifically collected and no extra CSF samples were drawn from any of the patients for the purpose of our research. No identifying information was acquired for this study, and patients gave prior written consent to the scientific use of their samples. For this part of the study 324 CSF samples were collected, 161 samples (49,69%) from male and 163 (50,31%) samples from female patients. The average age of the patients was 55,5 ± 21,1 years; for more detailed information about the samples, please refer to Additional file
1: Fig. S1A–C.
For the purpose of comparison, samples were also collected from CSF with high erythrocyte counts before the 105 × g centrifugation step, or from CSF with obvious signs of haemolysis after the lumbar puncture; in none of samples the number of leucocytes was above 23 before the centrifugation step. These samples were processed completely separate from normal CSF and will be referred to further on as blood-contaminated CSF samples. 36 blood-contaminated CSF samples were collected, 21 samples (58,33%) from male and 15 (41,67%) samples from female patients. The average age of the patients was 61,45 ± 23,3 years; for more detailed information about the blood-contaminated samples, please refer to Additional file
1: Fig. S1D–F.
From all CSF samples a 100 µl aliquot was tested for bacterial or fungal contamination for five days at 37 °C in cell culture medium; none of the samples included in the study showed any signs of contamination.
Processing of human CSF samples
CSF samples necessary to make up a total volume of 74 ml were thawed on ice and pooled. The pooled samples were mixed and briefly centrifuged in order to divide the pool in two equal aliquots of 37 ml. One aliquot was used for column centrifugation and the other one for extracellular vesicle (EV) preparation; 20 aliquots for each kind were prepared.
Ultrafiltration of 36 ml aliquots of CSF with spin columns
Ultraspin columns are molecular size-based filters that amongst others can be used to isolate protein-bound RNA and RNA included in extracellular vesicles [
38,
40]. Tuchinovic’s work with human plasma and cell culture medium and own work with serum and blood-contaminated CSF samples showed that concentrates from appropriate-sized filters of ultraspin columns retained all of the measurable RNA content, whereas the corresponding filtrates were depleted of measurable amounts of RNA (see Additional file
1: Fig. S2). This work was done with 100 KDa ultraspin columns, but as in our hands 50 KDa columns seemed to be more consistent in respect to the processing time of CSF samples than 100 KDa columns, we accomplished all RNA-preparations of whole CSF for next-generation sequencing (NGS) with 50 KDa ultraspin columns.
Three ml CSF were pipetted into each of four ultrafiltration spin columns with a molecular cut-off of 50 KDa (Vivaspin Turbo4; Satorius, Germany) and centrifuged at 1860 × g at 4 °C until the volume was concentrated to approximately 250 µl. Then to each of the four spin columns another 3 ml of CSF was added, and the columns were again spun until the volumes were concentrated to approximately 250 µl. This step was repeated one more time, but at the last centrifugation the volume was concentrated to approximately 200 µl. The first centrifugation step takes around 10 min, the second about 18 min and the last step approximately 25 min. After centrifugation the four resulting concentrates of the 36 ml CSF were transferred to DNA-low-binding tubes and each emptied concentration chamber of the spin columns was rinsed once with 50 µl ice-cold 10 mM TRIS pH 7, 4. The rinses were then added to the respective concentrates to make up a total volume of 250 µl in each of the four DNA-low-binding tubes; these four CSF-concentrates were finally used for preparation of one RNA sample.
11, 2 ml ice-cold PBS was added to 36 ml of pooled CSF samples; the combined volumes were carefully mixed, briefly centrifuged, and divided into four times 11,8 ml, which were distributed to four ultracentrifugation tubes (Beckman coulter). The tubes were balanced with ice-cold PBS and then centrifuged at 4 °C at 180,000 × g for 4 h. The resulting supernatants were aspirated by pipetting and to each pellet 1 ml of Tri-Reagent (Sigma T9424) was added. The tubes were vortexed for 30–60 s, briefly centrifuged and the suspensions were then transferred to 1,5 ml DNA-low-binding tubes. The suspensions were left standing for 5 min at room temperature and then further used for RNA-preparation.
RNA-preparation of ultrafiltrated CSF
To each of the four 250 µl CSF-ultrafiltrates, 0,75 ml Tri Reagent (Sigma T3934) was added; the mixtures were vortexed for 60 s and then left standing for five minutes at room temperature. Then 100 µl of 1-bromo-3-chloropropane was added to each tube, samples were vortexed for 30 s and left standing at room temperature for five more minutes. Samples were then spun at 12,000 × g at 4 °C for 10 to 15 min to separate the watery from the organic phases. 350 µl from each upper watery phase were transferred to a 2 ml DNA-low-binding tube. The remaining watery phases of the first extractions, were reextracted with 400 µl of 10 mM TRIS pH 7,4 (vortexed for one minute, left standing for five minutes and centrifuged for 10 to 15 min), and 450 µl of the reextracted watery phases were then combined with the 350 µl volumes of the first extraction step. To each sample 4 µl of glycoblue (15 mg/ml) and 27,5 µl 3 M sodium acetate pH 5,2 were added. Samples were mixed carefully and then 800 µl of -20 °C cold isopropanol (equivalent to the volumes of the combined watery phases) were added to each sample; samples were again vortexed and then stored for RNA-precipitation overnight at − 20 °C. The next day one of the four samples was centrifuged for 45 min at 4 °C at 13,000 × g, the supernatant was decanted and the content of another tube from the precipitation step was pipetted onto the pellet of the first tube. The tube was again centrifuged for 45 min at 4 °C at 13,000 × g and the supernatant was decanted; this was repeated until the content of all tubes from the precipitation step were concentrated in one tube, resulting in a pellet that combines the RNA of 36 ml whole CSF. After washing the pellet once with 1000 µl 75% ethanol, it was resuspended in 75% ethanol and kept at -80 °C until to the last precipitation step. In the last step the RNA was pelleted by centrifugation for 45 min at 4 °C at 13,000 × g and each pellet was dissolved in 8 µl 10 mM TRIS pH 7,4 for NGS analysis.
The four tubes containing the 1 ml Tri-Reagent and the extracellular vesicle RNA were then treated as described in Extracellular vesicle preparation of 36 ml aliquots of human CSF with ultracentrifugation. Then 100 µl 1-Bromo-3-Chlor-Propane were added to each tube and the suspensions were vortex for 15 s. Tubes were again left standing for five minutes before they were then treated as described for RNA-preparation from ultrafiltrated CSF (see above).
Collection and processing of human serum samples
Sixteen samples of 15 ml of blood were collected from healthy volunteers of our research group and volunteers who donated blood to the blood bank of the University Medical Center of Göttingen. Serum was separated with serum separator tubes at 2000 × g for ten minutes at 4 °C. After centrifugation the serum was aliquoted and stored frozen at -80 °C. A 100 µl aliquot of each serum sample was used to determine the hemoglobin content, and only samples with hemoglobin concentrations below the limit of detection of the routine analysis (< 5 mg/dl) were used for further processing. For more detailed information about the age and sex distribution of the sample donors, please refer to Additional file
1: Fig. S3.
RNA-preparation of whole serum
The preparation of RNA from serum concentrates of ultraspin columns is hampered by an approximately 200fold higher protein concentration in serum as compared to CSF; this results in long centrifugation times and extremely viscous concentrates that are difficult to pipette. Therefore, total RNA of serum was only prepared by ultraspin columns to proof the principle of the method, whereas total RNA of serum for NGS was exclusively extracted with Tri-Reagent (Tri-Reagent BD (T3809) for blood). For this purpose, 1 ml human serum samples from six donors were thawed on ice-water. From each single 1 ml sample four aliquots of 250 µl were added to four 1,5 ml DNA-low-binging tubes containing 750 µl Tri-Reagent. The tubes were vortex for 30 to 60 s and left at room temperature for five minutes, then 100 µl 1-bromo-3-chloropropan were added to each tube and the mixtures were again briefly vortexed and incubated for five minutes at room temperature. Then the RNA was prepared from each sample exactly as described in RNA-preparation of ultrafiltrated CSF, resulting in four independent RNA preparations that finally were pooled to one RNA sample.
One millilitre of serum was added to 9 ml of ice-cold PBS in ultracentrifugation tubes, the tubes were carefully mixed and briefly centrifuged to collect all liquid; tubes were then balanced with ice-cold PBS and centrifuged for 4 h at 180,000 × g at 4 °C. After centrifugation the supernatants were carefully pipetted from the pellets and 1 ml Tri-Reagent was added to each tube. In contrast to the extracellular vesicle pellets of CSF, the extracellular vesicles of serum formed visible pellets, and the resuspension of these pellets was achieved by vortexing and holding the tubes briefly in an ice-cold ultrasonic bath. After resuspension and five minutes incubation at room temperature, 100 µl 1-bromo-3-chloropropan were added to each tube, and then samples were exactly treated as described for RNA-preparation from ultrafiltrated CSF.
MiR- and mRNA-sequencing and transcriptome analysis
The non-coding RNA sequencing (ncRNA-seq) and its primary analysis were performed by the NGS Integrative Genomics Core Unit (NIG, Göttingen, Germany). For RNA-sequencing RNA samples were subjected to non-stranded mRNA library preparation using the TruSeq RNA Sample Prep Kit v2 with minor modifications (ligation and PCR amplification cycles). Fragment sizing of final libraries were analyzed via Fragment Analyzer (average of 300 bp). Libraries were sequenced (SE, 30 Mio reads/sample)) on the HiSeq 4000 platform. For miR library preparation we used the QIAseq miR Library Kit, a gel-free miR sample according to manufacture recommendations. Fragment sizing of final libraries were analyzed via Fragment Analyzer (average of 70 bp). Libraries were sequenced (SE,10 Mio reads/sample)) on the HiSeq 4000 platform.
The whole RNA from each sample was used for both, the small and long RNA NGS approach. Sequenced reads were initially trimmed for Qiagen Small RNA 3’ Adapter using cutadapt version 2.10 [
41]. The trimmed reads were aligned to the Homo sapiens non-coding regions in hg38 from ENSEMBL (
https://www.ensembl.org/Homo_sapiens/Info/Index) using bowtie2 version 2.3.4 with default parameters [
42]. High-quality mapped reads (MAPQ = 1 or MAPQ > 4) were selected from the resulting alignment files and quantified for the non-coding regions in the Homo sapiens sapiens genome assembly hg38 using Salmon version 1.2.1 [
43] using traditional expectation maximization (EM) algorithm. Finally, deregulated non-coding RNAs were derived by comparing samples from various conditions (e.g. whole CSF vs. CSF EV and whole serum vs. serum EV) using the R package DESeq2 version 1.31.5 [
44], where the initial filtering condition involved RNAs with ≥ 10 counts per RNA species in at least one sample of each group was kept.
Sequenced reads of long RNA were aligned to the Homo sapiens sapiens genome assembly hg38 from ENSEMBL (
https://www.ensembl.org/Homo_sapiens/Info/Index) using STAR version 2.5.2 with default parameters [
45]. The resulting alignment files were used to quantify the number of reads per gene in human gene assembly version 97 using featureCounts version 1.5.0 [
46]. Similarly to the non-coding RNAs, transcripts were analysed for their deregulation between various conditions using the R package DESeq2 and relying on the same filtering (RNA must have ≥ 10 counts in at least one sample of each group).
While miR could be tested for deregulation between particular conditions, determining their biological context was more challenging, since direct association of miR and functional terms (gene ontology categories or pathways) were not available, thus making a direct enrichment analysis of biological terms impossible. Therefore, the analysis involved initially annotating the miR to their target coding genes, and then using those target genes for the enrichment analysis. In brief, all transcripts tested for a particular comparison (e.g. whole CSF vs. CSF-derived EV) were overlayed with their ENSEMBL gene IDs from the human gene set version 97 (
http://ftp.ensembl.org/pub/release-97/gtf/homo_sapiens/Homo_sapiens.GRCh38.97.gtf.gz). The ENSEMBL gene IDs were mapped to their comparable miRBase IDs using the R package biomaRt. Utilizing the miRBase IDs of miR of interest as input, the R package multiMiR was used to extract target genes of those miR, where validated targets relied on the databases miRecords [
47], miRTarBase [
48]and TarBase [
49]. Finally, an over-representation-analysis (ORA) was performed using WebGestalt [
50], where the target genes of particular sets of deregulated miR were used as input, and the target genes of all miR tested in a particular differential expression analysis were used as the reference.
Samples derived from low quality libraries in which the number of detected RNA-species was more than 4.9 fold standard deviations below the mean of all RNA-species found in quality libraries, were excluded from further evaluation. From long RNA-sequencing four libraries had to be excluded (two libraries from the CSF-EV-, one from the whole CSF- and one from the serum EV-fraction); from small RNA-sequencing all libraries were included in the analysis. Statistical differences between groups were analyzed with the Mann Whitney test with Prism 7 for Mac.
Discussion
A major obstacle for reliable analysis of RNA profiles from human CSF, unlike serum, is the limited sample volumes of CSF usually provided by clinical diagnostics, as well as the very low RNA content of CSF [
19,
22,
26,
27]. Furthermore, quality checks of CSF-derived RNA are difficult and often RNA peaks are hardly seen in electropherograms of analyzers [
26,
58]. Additionally, minor amounts of RNA can significantly decrease the number of robustly detected RNA species in analysis [
26], and natural variations of donors can confound statistical analysis [
14,
30]. These circumstances make definite and consistent measurements of RNA concentrations difficult. More reliable RNA-profiling should be possible from larger volumes of pooled CSF samples [
58]. In order to provide a comprehensive and unequivocal analysis of small and long RNA profiles of whole CSF and CSF-derived EV, we used outsized volumes of CSF prepared from pooled CSF of healthy male and female donors aged between 0,4 and 93,4 years, that should level out natural variations, and thus to determine the fundamental characteristics of RNA distribution in human CSF to set a solid scientific basis for future studies employing also smaller CSF volumes.
As access to human CSF depends on the clinical supply and is usually only provided after the end of routine diagnostics, we first examined whether handling during routine diagnostics might affect the RNA content of CSF. Similar to blood, serum, and plasma [
15,
24,
25,
32,
38] we show, that the extracellular RNA content of CSF is neither affected by long time storage at 4 °C nor by RNAse treatment. On the other hand, we show that the exclusion of contamination by foreign RNA as for example from blood-derived cells, is an important premise for meaningful and convincing RNA profiling of human CSF samples, as even small contaminations can cause significant bias of the profiles. This is obvious from our electropherograms, RNA measurements, gel analysis and Venn diagrams comparing transcripts of blood-contaminated CSF samples with respective CSF and serum samples, as well as from observations by others [
15,
18,
20,
28,
37,
51,
58,
59].
All analysis of the small RNA-profiling, show the disparity of the RNA content of each of the four different fractions analyzed, and surprisingly the largest difference (altogether 59%; Table
1) between two groups is not seen between a serum and a CSF fraction, but between whole CSF and CSF-derived EV, whereas the smallest difference is seen between whole serum and serum EV (24%) and the second smallest between serum EV and CSF EV (31,2%). The volcano plot of serum EV versus CSF EV in Fig.
3 has few points with very small p-values representing strongly differentially regulated transcripts that contribute to the distinct patterning in the clustering and PCA plot, but the general profile of the volcano plot from both EV fractions is similar flat as the plot of serum EV versus whole serum and thus obviously different to the volcano plots of the remaining four comparisons. Furthermore, Fig.
1D points to an obvious similarity of both EV fractions in respect of the percentage of small RNA species detected out of all known small RNA genes of the human genome. In addition, Venn diagrams of all significantly expressed small RNA and of all significantly expressed miR in each of the four fractions shows the most common transcripts between both EV fractions. Moreover, the WebGestalt analysis also reveals the largest number and percentage of commonly affected sets of target-transcripts by miR expressed in both EV fractions; i.e. in these respects the two EV fractions are even more similar than the two serum fractions (Additional file
1: Fig. S14). These data point to an exchange of small RNA between serum and CSF via EV, an assumption supported by recently accumulated evidence suggesting that EV can cross the blood–brain barrier [
60,
61]. As the difference in small RNA content between serum fractions is the lowest whereas it is the highest between CSF fractions, it is likely that the traffic of EV is mainly from serum to CSF and not vice versa. If RNA is exchanged between serum and CSF, the measurement of transcripts in only one EV fraction could be misleading for diagnostics, and a ratio of respective serum and CSF fractions, as used for proteins in CSF diagnostics [
62], would be more appropriate and possibly informative in respect to the integrity of the blood–brain barrier.
A graphical evaluation of 664 small RNAs significantly up- or down-regulated in all four fractions (Additional file
1: Fig. S15) shows that most of the small RNAs have equivalent concentrations, in both, body fluid and corresponding EV, but some show a reciprocal pattern, i.e., have higher read counts in CSF than in serum, and are less expressed in CSF-derived EV than in serum-derived EV. This pattern can neither be explained by passive diffusion across the blood–brain barrier nor by a cell homeostasis-driven, constitutive and proportionate release of small RNAs from cells by EV, nor by constitutive non-vesicular pathways into the corresponding body fluids. These inverse expression levels of some small RNAs in body fluid and corresponding EV are more likely due to a general or cell-specific sorting mechanism of small RNAs, or possibly facilitated by a selective transport of certain EV across the blood–brain barrier.
The WebGestalt analysis in our study shows, that miR, significantly differentially expressed in EV and respective body fluid, have the potential to affect different sets of transcripts and thus different pathways and distinctive cellular, molecular and biological functions. Therefore, miR and possibly other small RNAs in EV and the respective body fluid, might also have the potential to differentially interfere with the development and prevention of human diseases. A direct comparison of WebGestalt-miR target sets between whole CSF and CSF derived EV shows that miR-targed gene sets involved in neurological development and diseases are differentially represented in each fraction. E.g., whereas in CSF derived EV the analysis revealed four differentially expressed sets of miR targets (involved in central nervous system neuron differentiation, neuron projection guidance, postsynaptic specialization, and regulation of commissural axon pathfinding by SLIT and ROBO), there are eight different sets of differentially expressed miR targets in whole CSF (involved in amyloid-beta metabolic process, loss of function of MECP2 in Rett syndrome, neural precursor cell proliferation, neurodegenerative diseases, neuron to neuron synapse, regulation of synapse structure or activity, Sema4D induced cell migration and growth-cone collapse, and synaptic vesicle cycle) (Additional file
3: Table S7).This underlines that searches for diagnostic small RNA markers might easily fail, if transcripts expressed in a disregarded fraction are not taken into account. It is conceivable that the expression level of a given small RNA differs between diseased and healthy people in only one compartment (body fluid or EV) but not in the other, and therefore, comprehensive searches for small RNA-based disease markers ought not be restricted to either the body fluid or the body fluid-derived EV, but should rather encompass both fractions. This holds true for serum as well, although the difference in small RNA content between serum and serum-derived EV is less than half of the difference between whole CSF and CSF-derived EV (Table
1).
The differences in the long RNA-profiling seem less profound than the small RNA-profiling, and most long RNA transcripts found in whole CSF are also found in the three other fractions. Nevertheless, the generally higher number of significantly differentially expressed long RNAs leads to comparable total numbers of up- and down-regulated transcripts in both preparations (Table
2), and thus, the value of long transcripts for searches of molecular disease markers should not be underestimated. E.g., a direct comparison of WebGestalt mRNA sets between whole CSF and CSF derived EV shows that coding transcripts involved in neurological development and diseases are also differentially represented in whole CSF and CSF derived EV. In CSF derived EV the WebGestalt analysis revealed three differentially expressed long RNA transcript sets (involved in Alzheimer disease, Huntington disease, neural nucleus development) and six long RNA transcript sets in mitochondrial metabolism (mitochondrial inner membrane, mitochondrial membrane part, mitochondrial protein complex, mitochondrial protein import, mitochondrial translation, mitochondrial transport), whereas no such transcript sets are found in the whole CSF fraction (Additional file
5: Table S15). Again, the WebGestalt analysis of differentially expressed long RNA transcripts in EV and respective body fluid, have the potential to unequally affect cellular and biological processes and hence also might differentially interfere with the development and prevention of human diseases.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.