Introduction
The average lifespan of individuals in the developed world has increased dramatically over the last century, as deaths from trauma and infection have declined. The incidence of neurodegenerative diseases associated with ageing, including dementia, has risen concomitantly, bringing significant social and economic costs. However, our understanding of genetic factors controlling nervous system form and function in health and disease is far from complete. Thus, the identification of genes that control neuronal health, and elucidation of core molecular interactions that could ultimately be exploited for the development of novel therapeutic interventions, remains a major challenge.
A large proportion of genes in animals are involved in the development, differentiation, maintenance and functioning of the nervous system. For example, in
Drosophila, 11% of annotated and predicted genes showed a specific neurological phenotype upon knockdown [
1] with 336 showing a strong phenotype and 2106 showing moderate to weak phenotypes. Humans with chromosomal microdeletion or microduplication syndromes (also known as contiguous gene syndromes) frequently experience intellectual disability, indicating both the complexity of the pathways and the density of genes for neuronal development in the human genome [
2,
3]. The nervous system shows a high level of transcriptional diversity. Approximately 80% of all transcripts are expressed in mammalian brain [
4‐
7]. In one study [
8], adult human brain regions expressed more than twice as many different transcripts as pancreas. Understanding the functions and interactions of the different genes expressed by cell types within the nervous system is critical if the key genetic networks modulating form and function of the mammalian nervous system are to be clarified, but many of the genes are unknown or poorly annotated and there is little idea of their function, specificity or importance.
Analysis of mouse and human immune and connective tissue gene expression data has been used previously to infer gene functions of novel genes from co-expression networks [
9‐
14]. Numerous datasets documenting the transcriptome of the mammalian nervous system are also available. Importantly, these have revealed the complexity of networks regulating health of the nervous system and implicate roles for the wide variety of supporting (glial) cell types [
15‐
18] including astrocytes, oligodendrocytes, microglia and connective tissue cells that make up the majority of cells in the human brain [
19]. We now present an analysis of combined datasets of transcription in mouse tissues, revealing key genes and networks present in the mammalian nervous system, and review the literature concerning nervous system cell type-specific genetic markers. We identify previously unknown genes involved in regulating neuronal form and/or function which provide new targets for defining critical pathways that sustain nervous system health.
Discussion
The rapid development of technologies facilitating large-scale genome and transcriptome analysis has led to the generation of a vast resource of data, with the capacity to offer new insights into the genetic organization and regulation of biological systems in health and disease. This has been particularly notable in the field of neurogenetics research (for examples, see [
15‐
18,
62‐
68]). In this study, we used BioLayout
Express
3D [
21] to interrogate publicly available databases of genome-wide expression results in the mouse. Unlike many other network analysis software tools, BioLayout
Express
3D employs an unstructured approach to cluster genes based on gene expression patterns across the sample set [
9,
21]. It does not incorporate pre-existing knowledge of biological pathways and thus is able to identify previously undocumented relationships among genes. BioLayout
Express
3D enables the user to visualize complex relationships in two and three dimensions and cluster genes based solely on gene expression pattern. We were then able to identify clusters containing highly annotated genes and infer new functions/phenotypes of poorly annotated genes by guilt by association. Our overall goal was to identify novel genes as candidates for a role in nervous system health and function.
The level of a specific mRNA in a cell indicates the potential for the cell to make the encoded protein. To validate the relationship between RNA level and phenotypic outcome mediated by the protein product, we initially used two clusters of genes: the largest cluster identified in our analysis, Cluster001, associated with expression in testis, and a mitochondrial cluster (Cluster014). We showed that both of these clusters had corresponding phenotypes consistent with the expression pattern. The testis cluster was associated with reproductive phenotypes, including male sterility and the mitochondrial cluster was associated with known mitochondrial diseases. We also showed a high level of replication in a different mouse data set and in a smaller set of tissues from the pig, suggesting that the genes in the clusters are generally highly correlated. Addition of a neurodegenerative disease sample (ME7 prion infection) perturbed the nervous system and some immunological clusters, indicating that the disease state impacts on these groups of genes and providing potential insights into the disease process.
We then focused our analysis on distinct clusters of genes expressed in different cell types/regions of the nervous system. This analysis highlighted the differences between brain regions and cell types. High expression of some genes was common to all nervous system regions (general nervous system clusters in Table
1 and Online Resource
3). These included genes encoding proteins of synaptic and axonal compartments such as signalling molecules and receptor-mediated developmental guidance and patterning. Several regions were found to have specific gene expression signatures. For example, cerebellum was characterized by expression of genes involved in immunological and inflammatory responses [
32] while the nucleus accumbens and dorsal striatum shared high expression of genes involved in protein kinase A signalling and mitochondrial permeability. These gene expression signatures were largely consistent with phenotypes generated in knockout mice and in humans with gene mutations. Approximately half of the genes identified in our BioLayout
Express
3D analysis clusters had a characterized knockout mouse model. The top-associated knockout mouse phenotype for the majority of clusters was linked to the nervous system, in particular, the synapse, reflecting the established role of the synapse in nervous system form and function (reviewed in [
35]). We noted that not all clusters showed specifically neuronal phenotypes when the genes were knocked out in the mouse. Three clusters were enriched for behavioural phenotypes (Cluster053, Cluster027 and Cluster042), two clusters showed vascular and respiratory phenotypes (Cluster048 and Cluster045) and two small clusters contained genes related to gastrin release and obesity (Cluster089 and Cluster096). All of these can be linked back to the nervous system. For example, the brain controls appetite and food intake [
69]. Additionally, a small number of mouse knockouts lacked any overt phenotype, which could be attributable to redundancy between different members of the same gene family, exemplified by the Cluster005 gene
Brsk1. Knockout mice were viable and fertile with no overt phenotypic abnormalities (Online Resource
5). However, double
Brsk1/
Brsk2 knockout mutants showed clear neurological phenotypes: minimal spontaneous movement, weak responses to stimulation and neonatal death [
70]. Thus the role of these genes was only revealed when both were non-functional. Our approach to validation of our clusters has shown that generation of knockout models for other genes in the clusters will reveal novel genes that contribute significantly to neurological function.
Some markers considered definitive of specific cell types are not present in the corresponding clusters. There are a number of explanations for this. Firstly, many classic antibody-based markers of cell type have different names from the gene names. For example, the gene encoding the microglia marker IBA1 is Aif1 in the current annotation. Secondly, some markers used to identify microglia in the brain are also found in macrophages, such as F4/80 (encoded by Emr1). Since we have several macrophage subsets in our analysis, these genes do not fall into the microglia cluster but into the main macrophage clusters. Thirdly, some of these markers have unique expression patterns, which are not correlated with the expression of any other gene at the threshold correlation coefficient value used. For example, the three Csf1r probe sets do not cluster with any other probe sets because of the unique expression pattern of Csf1r which includes expression in placenta. Finally, if these markers have low expression in the tissues analysed they would have been excluded by our filtering process.
We have previously shown that the level of gene annotation is associated with the intensity with which the tissue or function has been studied, often a reflection of whether the gene is tissue- or function-specific, when it is likely to be well annotated, or more ubiquitous and likely to be minimally annotated [
9,
13]. Thus, the mitochondrial Cluster014 was well annotated, as was the myelin Cluster091 and the clusters highly expressed in neurons. These cell types and functions have been extensively examined for many years and it is not surprising that most genes in these clusters are well understood. In contrast, several clusters were poorly annotated, including testis Cluster001, dorsal root ganglia Cluster088 and cerebellum Cluster037 and 056. The testis has an extensive transcriptome of approximately 20,000 genes, with many novel splice variants, alternative promoter usage and long non-coding RNA species (see, for example, [
71‐
73]). We found that one third of genes in the main testis cluster had no annotation, consistent with this extensive and novel transcriptome. The lack of annotation for the transcriptomes in the cerebellum and dorsal root ganglia suggests that these regions also have novel transcripts which may be a rich source of candidate genes for neurological conditions.
Importantly, this study enabled us to identify a large subset of genes with minimal or no GO annotation; some of which we have shown through the MGI Jax database, the Allen Brain Atlas, and human and pig expression data to be exclusively expressed within the nervous system and others linked to human disease. Presence of unannotated genes in a cluster indicates an expression pattern similar to that of the genes encoding proteins of known function in the same cluster and suggests that these genes are highly likely to represent novel candidates for roles in the regulation of form and/or function of the mammalian nervous system.