Background
Implementation of next-generation sequencing (NGS) to analyse RNA expression has revolutionized our capacity to approach different cell functions. However, the precise functional interpretation of whole-transcriptome expression data is still challenging [
1]. Currently, there are multiple approaches to analysing the output of RNA-sequencing (RNA-seq), but there is little consensus on the algorithms used to process and interpret the data. Integration of multiple RNA-seq data sets with extensive amounts of data from genome-wide association studies (GWAS) and protein interaction databases may improve the interpretation of data and contribute to the discovery of new molecular targets that are supported by experimental evidence on multiple levels [
2].
Here we attempted to incorporate previous knowledge from genome-wide association studies and pathway analyses into the framework of an RNA-seq-based study of differential gene expression in rheumatoid arthritis (RA). RA is a common autoimmune disease, manifesting as chronic inflammation of the joints, and characterized by a significant genetic contribution [
3‐
5] and gender bias (1.8–3.6 female-to-male ratio) [
6,
7].
Refinement and meta-analysis of large-scale GWAS have found associations between risk of RA and more than a hundred variants in non-human leukocyte antigen (HLA) loci [
8‐
11]. A comprehensive review by Okada et al. summarises a multitude of verified genetic variants playing a role in RA susceptibility, suggesting 377 genes for prospective study, based on their proximity to verified RA-associated single nucleotide polymorphisms (SNPs) [
12]. Multiple previous studies suggest enrichment for cis-acting variants, located in close proximity to coding genes [
13].
Focusing on previously validated genetic variants associated with RA (with genome-wide significant association), we used RNA-seq to address the expression of the genes proximal to these SNPs, in whole-blood samples from patients with RA and from healthy individuals. We differentiated between treated and non-treated patients with (early) RA to identify common expression signals in these two groups, and controlled for the effects of treatment when comparing them to healthy controls. We applied pathway analysis to differentially expressed genes from RNA-seq data, which correspond to previously validated genetic variants associated with RA, with the goal of identifying new functionally meaningful candidate genes and gene networks playing a role in the disease. A new gene set derived from interconnections in pathway analysis, was analyzed for differential expression and validated in a larger independent collection of samples, sharing a similar structure with our discovery cohort.
Discussion
Our data suggest at least three new candidate genes involved in the development of RA: ERBB2, TP53 and THOP1. This finding is based on the integration of previous knowledge from RA association studies, our own RNA-seq expression data and comprehensive pathway analysis with replication in the COMBINE validation cohort.
Initially we hypothesised that genes close to previously found genetic association hits might influence disease-related phenotypes through regulation of RNA expression. This type of association may point to other genes important in RA development, which are part of the same disease-related pathway but do not exhibit a significant change in allelic frequencies defined in conventional GWAS.
The selection of genes proximal to associated loci was based on the criteria previously utilized by Okada et al. [
12]. By using this approach we first identified that 11 genes, proximal to validated RA-associated genetic variations, were indeed differentially expressed in whole blood from patients with RA in comparison to healthy controls. Notably, samples could be reasonably grouped into RA and non-RA based on this expression profile alone. This particular gene set, however, did not provide a distinctive clustering between treated and non-treated patients with RA. This circumstance aligns with our intention to avoid genes, displaying heterogeneity of gene expression depending on response to treatment among patients with RA.
Using the IPA service in the second part of the discovery stage, we obtained information about the functional relations between the DE genes from RA-associated loci. As a result, 6 out of 11 input genes were grouped into a single network, where
IFNG and
TNF served as connecting hubs. Importantly, this network also contained
HLA-DRB1. It is notable that shared epitope alleles of this gene are well established as the strongest genetic risk factor of RA. Additionally, TNF was earlier identified as one of the most successful drug targets and currently significant number of patients with RA are receiving anti-TNF treatment [
20]. IFNγ is also a well-established contributor to autoimmune reactions during RA course; anti-IFNγ treatment, however, show significant side effects [
21]. Thus, discovering that HLA-DRB1, TNF and IFNγ are components in our networks is reassuring in terms of the validity of the integrative approach used in this study.
An important feature of all current treatments for RA is the absence of long lasting post-treatment effect: joint destruction, pain and inflammation reoccur after the cancellation of medication [
22]. Based on these observations, we assumed that currently available treatments for RA (including most common methotrexate treatment, anti-TNF treatment and other disease modifying anti-rheumatic drugs, DMARDs) are only palliative and influence the symptoms of inflammation rather than disease-developing pathways. Although we cannot exclude the possibility of DMARDs modifying expression of genes involved in disease pathways, it is tempting to hypothesise that only the physiological changes in RA that are common between treated and non-treated patients in comparison to healthy controls are important for the fundamental mechanisms of the disease. Following this hypothesis, we were prompted to compare non-treated and treated patients versus controls without pooling RA samples into a single group, but rather focusing on common effects in independent patient groups. We used Fisher method for combining p-values as an established approach to distinguish common effects in similar populations with an expected degree of heterogeneity [
23,
24]. This approach may be helpful in pointing to gene products that may not be affected by current anti-rheumatic treatments, and has been used previously in expression data analyses [
25].
Addressing the expression of the “connector” genes, suggested by the pathway analysis, revealed that some of them were DE in whole blood. Although not previously connected to RA, several of these genes were recently shown to be implicated in autoimmunity (e.g. CARD6 in psoriasis [
26], PTGDR in asthma [
27], BPI in cystic fibrosis [
27]) and immune-related processes [
28,
29].
However, with the exception of HLA-DRB1, ERBB2, TP53 and THOP1, DE was limited to the comparison of healthy individuals to either treated or non-treated RA groups, but not both. This could be potentially attributed to the heterogeneity introduced by treatment. The study design involving both early and established RA was intended to favour genes contributing to disease development, rather then those connected to the acute manifestation of inflammatory symptoms. Therefore, the validation of ERBB2, TP53 and THOP1 expression in a similarly-structured independent material may point at the importance of these genes in the pathogenesis of RA.
Multiple studies have previously implicated
TP53 in RA pathogenesis, showing that decreased expression on both mRNA and protein level contributes to severe defects in apoptosis, potentially enhancing the severity of autoimmune processes in patients with RA (reviewed in [
30,
31]). However, genetic association studies never recognised this gene as associated with RA. Our findings of lower
TP53 expression in patients with RA fall in line with the results of previously published studies.
Our data also indicate lower expression of ERBB2 in whole blood and PBMCs of both treated and non-treated Patients with RA compared to controls.
ERBB2 (HER2/neu) is a receptor tyrosine-protein kinase erbB-2, previously implicated in promoting hyper-proliferative growth in arthritic synovial tissue [
32]. Notably, it is known that ERBB2 protein plays an important role in the regulation of the NFkB pathway and, potentially, TNF signalling [
33,
34], which are both implicated in RA.
THOP1 (Thimet Oligopeptidase 1, also known as TOP) was implicated in RA for the first time in this study. Interestingly,
THOP1 has been found to promote rapid degradation of the antigenic peptides, and could affect antigen presentation in vivo [
35]. One could speculate that lower expression of
THOP1 observed in whole blood and PBMCs from patients with RA could result in abnormal antigen presentation, which might contribute to the pathogenesis of RA. In this context, it is tempting to investigate a possible functional relationship between
THOP1 and
HLA-DRB1 - the major genetic risk factor for RA.
It is important to mention the limitations of the current study. The discovery cohort is relatively small. Combined with the heterogeneity of gene expression measurements in clinical samples this could lead to insufficient power to detect of some of the genes. Indeed, on testing for DE in the COMBINE validation cohort independently from the discovery cohort, we observed discrepancies that may be indicative of multiple alternative mechanisms leading to differences in regulation of gene expression, different cell composition or more complex timing in this regulation.
Additionally, the RNA in the initial analysis was derived from whole blood, whereas the data from the COMBINE validation cohort is based on RNA from PBMCs. This may explain the discrepant DE results from the two materials on the same gene set, proposed by Okada et al. While useful for verifying and generalizing more consistent signals, this approach does not directly replicate results derived from whole blood. Therefore, it is possible that some of the existing signals could be missed by the current study. The search for pathways underlying RA-associated genes will benefit from larger studies with more stringent replication conditions.
Acknowledgements
We would like to thank Barbro Larsson for help in performing this study. We would like to thank The National Genomics Infrastructure (NGI) Sweden for access to UPPMAX and the Science for Life Laboratory for the help with RNA-sequencing.