Background
Late onset Alzheimer’s disease (AD) is an age-dependent neurodegenerative disorder characterized clinically by cognitive decline and pathologically by the accumulation of neuritic β-amyloid plaques (NP) and neurofibrillary tangles (NFT) in the brain. Currently genetic [
1], epigenomic [
2] and transcriptomic studies [
3,
4] coupled with advances in imaging techniques [
5,
6] have begun to sketch the sequence of events in the causal chain linking risk factors to a syndromic diagnosis of AD dementia. One of these events may be the dysregulation of gene expression by alterations in the expression of microRNA (miRNA) and long intergenic non-coding RNA (lincRNA) molecules.
miRNA are a class of small regulatory RNA that modulate gene expression via a multiprotein complex which facilitates the interaction between an miRNA and its complementary elements in the 3’UTR of target mRNAs to initiate transcript degradation and repression of protein production [
7,
8]. Aberrant expression of miRNA and/or its target mRNAs have been implicated in abnormal neuron function [
9] and in several neurodegenerative disorders [
10,
11]. Recently, certain miRNA, such as miR-132 have been associated with pathologic AD [
12‐
16]. However, these studies were conducted in a modest number of subjects with limited phenotypic information, and few results are consistent across these studies [
17].
lincRNA are RNA that are longer than 200 nucleotides and do not code for proteins [
18]. As with most long non-coding RNA and unlike miRNA there is no clear common functional mechanism for lincRNA [
18]; some may have a structural role in protein/RNA complexes. There is still debate over what percentage of lincRNA may be functional at all [
19]. However, focusing on the lincRNA that lie in the same locus as protein coding genes may provide insight into their functional correlates. Given the reported association of the long non-coding RNA BACE1-AS [
20] with AD and the lack of investigation of this class of non-coding RNAs in a large sample size, we investigated the potential role of long non-coding RNA in AD and its component pathologies.
We first evaluated the role of miRNAs previously associated with AD with measures of both neuritic amyloid plaque (NP) and neurofibrillary tangles (NFT) since these are key neuropathologic features of AD and allow us to explore the mechanism by which an AD-associated non-coding RNA contributes to disease. We secondarily expanded this effort to evaluate other miRNAs and lincRNAs to discover new associations. In addition, we leveraged transcriptome-wide RNA sequencing profiles, generated from the same RNA samples that were used to generate the miRNA profiles, to identify the functional consequences of altered miRNA expression on protein-coding genes in the human dorsolateral prefrontal cortex (DLPFC) and to identify additional cases where miRNA and mRNA AD associations converge in the human cortex.
Methods
Total RNA, including miRNA, was extracted from approximately 100 mg sections of frozen postmortem brain tissue from the dorsolateral prefrontal cortex (BA 9/46) of subjects from two previously described longitudinal cohorts of aging, Religious Order Study (ROS) and Rush Memory Aging Project (MAP) [
2,
21‐
23]. Tissue was thawed partially on ice and between 50 and 100 mg of gray matter was dissected from the section then transferred immediately to 1 mL of Trizol. The tissue was then quickly homogenized using the Qiagen TissueLyser and a 5 mm stainless steel bead, for 30 s at 30 Hz. The foam was settled with a quick spin, and the sample incubated for a minimum of 5 mins at room temperature. Debris was pelleted at 4 °C at 12,000 g for 10 min and Trizol was transferred to a new 1.5 mL tube. We continued preparation of samples following the instructions of Qiagen’s miRNeasy Mini kit, with volumes adjusted for 1 mL instead of 700uL of Trizol until wash steps. RNA was eluted from the miRNeasy spin columns in 75uL of elution buffer, and quality tested by Nanodrop and Bioanalyzer RNA 6000 Nano Agilent chips. RNA yields averaged about 25μg and RIN scores ranged from 2 to 9 with an average of 6.5. RNA was normalized to 33 ng/ul and plated into 96w plates for Nanostring processing using the nCounter Human miRNA Expression assay kit version 1 with reporter library file: NS_H_miR_1.2.rlf. The data collection from 733 post mortem brain samples was done at the Broad Institutes Genomics Platform (Broad Institute of Harvard and MIT, Cambridge, USA). Subjects from different diagnostic categories were distributed across experimental batches to reduce batch effects. To minimize variability at the ligation step, processing of the annealing and ligation steps was performed on the same thermocycler. Two thermocyclers were used for the purification steps, but all samples were placed in the same thermocycler for denaturation and hybridization steps. Two nCounter Prep Stations were utilized, but all samples were then scanned on the same Digital Analyzer. The data was collected in 8 batches of 96 samples and a single sample technical replicate was introduced as control in every single 96 well plates and sometimes twice in one single plate in two different cartridges.
Quality control and dataset pre-processing
All data is available at
https://www.synapse.org/#!Synapse:syn3219045. The miRNA from the Nanostring RCC files were re-annotated to match the definitions from the miRBase v19. The raw data from the Nanostring RCC files were accumulated and the probe-specific backgrounds were adjusted according to the Nanostring guidelines with the corrections provided with the probe sets. After correcting for the probe-specific backgrounds, a three-step filtering of miRNA and sample expressions was performed. First, miRNA that had less than 95% of samples with a missing expression level were removed. This is followed by removing samples that had less than 95% of miRNAs with missing expressions. Thus, the call-rates for the samples and the miRNA are set at 95%. Finally, all miRNA whose absolute value is less than 15 in at least 50% of the samples were removed to eliminate miRNA that had negligible expression in brain samples. After the miRNA and sample filtering, the dataset consisted of 309 miRNAs and 700 samples. A combination of quantile normalization and Combat [
24], specifying the cartridges as batches for the miRNA data, was used to normalize the data sets. The strong association observed between miRNA expression and RNA-integrity was validated via qRT-PCR (Additional file
1: Figure S3) and verified not to be specific to the Nanostring platform.
The mRNA sequencing data have been described elsewhere [
25]. For the lincRNA analysis, a non-gapped aligner Bowtie [
26] was used to align reads to a hg19 lincRNA reference [
27] and then RSEM [
28] applied to estimate expression levels for all lincRNA. A combination of quantile normalization and Combat [
24] to account for batch was used for normalization. After filtering out lowly expressed lincRNA with an expected count less than 5, a final dataset of 454 lincRNA measured in 540 samples was used.
Identifying differentially expressed miRNA or lincRNA
Simple linear regression analysis was used to associate the expression levels of miRNA and lincRNA to several variables that measured the neuropathology in the AD brain. These included either the numbers of neuritic plaques, neurofibrillary tangles or a binary variable representing the pathologic diagnosis of AD on autopsy according to the NIA Reagan criteria [
29]. All the associations were adjusted for age, sex, study (ROS or MAP), the proportion of neurons in the tissue [
2], RNA Integrity number (RIN) and post-mortem interval (PMI). The proportion of neurons in the tissue is estimated from DNA methylation data available from the same brain region of each individual, as described in our recent study [
2]. A Bonferroni corrected significance threshold of 0.05 was used to account for multiple comparisons.
Constructing micro-RNA and linc-RNA networks
Linear models were used to identify miRNA or lincRNA that were associated with either NP, NFT or AD. To do this the models included age, neuronal composition (NNLS), sex, study of origin (Study, ROS or MAP), post-mortem interval (PMI), RNA integrity number (RIN), NP, NFT and AD., NP, NFT and AD as covariates. Using these models a miRNA or lincRNA were included in the networks if there was evidence that any of effects sizes of NP, NFT or AD were non-zero (nominal p-value from an F-test from less than 0.05). For each of the included miRNA or lincRNA, forward stepwise variable selection was used with a Bayesian information criteria (BIC) to select which edges between miRNA or lincRNA and explanatory variables should be included in the network. As RIN is associated with all the miRNA and lincRNA its edges are excluded from the networks.
Pathway analysis
Our approach for pathway analysis of the miRNA data using pMim [
30] involved analyses of the miRNA and mRNA data on the 525 samples which had both miRNA and gene expression data. The mRNA analyses are designed to focus on sets of genes that are co-expressed and are predicted to be a target of one of the tested miRNAs. Targetscan v6.2 [
31] was used for prediction of miRNA targets and the GO biological processes [
32] were used for pathway annotation. Figure
3a outlines how pMim is used to construct and test “miR-pathways”, where a miR-pathway consists of genes that are targeted by a miRNA and lie in a common pathway. Specifically, pMim identifies sets of genes that (1) are predicted to be targeted by an miRNA associated with AD, (2) lie in a common biological pathway and (3) are also associated with AD diagnosis themselves in terms of mRNA expression. All 309 miRNA and their corresponding miR-pathways (sets of genes that are targeted by the same miRNA and share a common biological process) were tested; a joint statistic summarizes and ranks the evidence for both the miRNA and mRNA analyses testing if both a miRNA and one of its corresponding miR-pathways are associated with AD. This joint statistic is calculated with two significance combination methods. Within a miR-pathway, Stouffer’s method was used to combine significance of genes associated with AD. A one-sided Pearson’s method was used to combine the significance between the miR-pathway gene summary statistic and the association of its miRNA with AD. Under very strict assumptions these joint statistics could be considered as
p-values, however due to the large amount of correlation within an annotated pathway they are highly inflated. Hence we consider these joint statistics for ranking purposes only.
Discussion
The number of individuals profiled in our cohorts and the prospective nature of the brain collection in these longitudinal studies of aging make our dataset a valuable resource for exploring, in greater detail, the role of miRNAs that have previously been associated with pathologic AD: our analysis meets a need for studies in larger datasets [
17,
38] and offers a potentially less biased perspective of the disease than the comparison of AD cases and controls that are pulled from a brain bank to fit certain diagnostic criteria and were not collected prospectively. We see that the role of miRNAs in AD begins to be resolved in terms of AD’s component pathologies: linking a specific miRNA with either NP or NFT. These associations with a pathologic feature relating to either amyloid or tau pathology are important to generate hypotheses that can be tested in future studies, particularly in mouse models that may recapitulate only one of these pathologies. Aside from the two validated miRNAs, we found suggestive evidence of association (
p-values <0.05) for 8 of the 48 previously proposed pathologic AD-associated miRNAs, demonstrating that additional miRNAs may have smaller effects on AD. While all of the other studies age-matched their subjects, none explicitly modeled sex or other technical covariates in their analysis. In particular, we observe strong associations with RIN score, a technical measure capturing the quality of the RNA sample that led to spurious associations when not accounted for in the analysis. One study [
12] reports the overall RIN scores for their hippocampal and prefrontal cortex samples; however, they do not provide a description of how this measure correlates with AD, and the incomplete RIN score data reported by another study [
13] shows some potential differences between AD and controls. By accounting for the covariates that we have measured, we not only enhance confidence in the miRNA reported as being associated with AD pathology but also provide an opportunity to speculate on the reason why some miRNA may not have been validated due to the technical, clinical and demographic differences among subjects selected for different studies.
Our study has certain limitations, including the advanced average age at death (88 years) in these cohorts and the fact that they are representative of the older population but are not truly population-based, which limit the generalizability of our results. However, the high rate of autopsy (>90%) among study subjects ensures that our results are representative for the entire study population, which consists of subjects who are non-demented at the beginning of the study. Further, the data that we analyzed was generated from the cortex (gray matter). This is a practical compromise to attain large sample sizes, but it presents a challenge for future work as it is not clear which cell type may be driving a particular association. While accounting for the proportion of neurons in the tissue addresses some of the concerns that relate to the role of changes in the relative frequency of cell populations, future work in purified cell populations will be needed to resolve these questions more fully. The Nanostring technology used to measure miRNA expression could also introduce a technical bias in the replication of previously reported miRNA, and the use of probes to measure miRNA with Nanostring and of a predefined reference for aligning lincRNA data will limit the discovery of unannotated non-coding RNAs. Finally, we cannot comment on causality since we have a performed a cross-sectional analysis of brain tissue.
The availability of transcriptome-wide mRNA data from the same RNA samples in a large (
n = 540) subset of the ROSMAP subjects profiled for miRNA also provides a rare opportunity to directly explore the relation of miRNA with their putative target mRNAs and of miRNAs with a different class of non-coding RNAs, lincRNAs, in human tissue. Some of the lincRNAs are associated with AD pathology but their expression appears to be largely independent of the miRNAs. On the other hand, our large autopsy-derived mRNA sequence data has identified several different molecular pathways whose component genes have mRNA levels that are associated with AD and are targeted by AD-associated miRNA. The robustness of these results is nicely demonstrated by our lead miRNA, miR-132, which has been validated to be associated with AD in prior targeted studies [
12,
13,
39] and for which selected putative target genes have been evaluated in brain samples, including
EP300 and
SIRT1 [
12,
14]. Here, we not only refine earlier observations by showing that the effect of miR-132 is mediated by the accumulation of amyloid pathology but also expand prior targeted studies of downstream effects by organizing the target genes in pathways to highlight cellular functions, such as protein acetylation that appear to be targeted by alterations in miR-132.
The functional consequences of lincRNAs remain poorly understood, and we therefore could not repeat our pathway analysis with this subset of non-coding RNAs. Since lincRNAs may influence the coding RNAs found in the same locus, we did evaluate the association of the neighboring protein-coding genes with AD and the neuropathologic outcomes, but none of the coding transcripts were significantly associated. We also note that the previously reported BACE1-AS lincRNA [
20] was not in the reference used in this study, and therefore could not be evaluated in our study. The positive association of
BACE1 mRNA expression with AD was not replicated in our cohort (
P = 0.63).
With our analyses, we have therefore begun to use autopsy data from a large set of well-characterized human subjects that capture the heterogeneity of older human brains to resolve which aspect of AD-related pathology is influenced by each miRNA of interest. The two well-validated miRNAs in AD illustrate this well: while miR-132 and mir-129-5p are strongly associated with the correlated amyloid and tau pathology measures, both miRNAs are more strongly associated with amyloid pathology than with the accumulation of Tau pathology when the pathologies are included in the same model. This leads to very different experimental paths to further dissect the mechanism of these miRNAs in AD. In addition, using these quantitative pathologic traits that are more precise than a categorical diagnosis of AD, we find that some new miRNAs, such as miR-99b, that may have a stronger effect on a specific pathology, such as Tau/NFT. lincRNAs also appear to be involved, but the downstream consequences of these non-coding RNAs remain unclear. Nonetheless, the lincRNA associations bring another dimension to the broad narrative that emerges from our report: that molecular changes associated with AD include an important alteration in the regulation of cortical transcription, which is consistent with prior reports of epigenomic changes in certain model systems such as
Drosophila melanogaster DNA methylation profiles and the involvement of REST in AD [
40,
41]. This narrative is also illustrated by miR-200 where the simple analysis of the miRNA alone is suggestive but not convincing of association with AD; however, an integrated analysis that also considers alterations of miR-200 target genes prioritizes this miRNA and downstream transcriptional changes in anion transporters for further evaluation. Such integrated analyses of complementary data may be helpful to resolve the broader perspective on alterations in cellular function in AD. With this manuscript, we therefore provide a robust foundation of detailed neuropathologic associations that set the stage for a new generation of integrative analyses that consider different molecular measures generated from the same subjects and allows for the direct modeling of the complex phenotypic and molecular heterogeneity of the aging population at risk of AD and other dementia.
Acknowledgements
We would like to thank the participants of the ROS and MAP studies for their participation in these studies. Support for this research was provided by grants from the US National Institutes of Health (R01 AG036042, R01 AG036836, R01 AG17917, RF1 AG15819, R01 AG032990, R01 AG18023, RC2 AG036547, P30 AG10161, U01 AG46152, P50 AG016574, U01 ES017155, K25 AG041906-01) and the Rainwater Foundation/Tau Consortium.