RhesusBase: a knowledgebase for the monkey research community

Zhang, Shi-Jian; Liu, Chu-Jun; Shi, Mingming; Kong, Lei; Chen, Jia-Yu; Zhou, Wei-Zhen; Zhu, Xiaotong; Yu, Peng; Wang, Jue; Yang, Xinzhuang; Hou, Ning; Ye, Zhiqiang; Zhang, Rongli; Xiao, Ruiping; Zhang, Xiuqin; Li, Chuan-Yun

doi:10.1093/nar/gks835

Abstract

Although the rhesus macaque is a unique model for the translational study of human diseases, currently its use in biomedical research is still in its infant stage due to error-prone gene structures and limited annotations. Here, we present RhesusBase for the monkey research community (http://www.rhesusbase.org). We performed strand-specific RNA-Seq studies in 10 macaque tissues and generated 1.2 billion 90-bp paired-end reads, covering >97.4% of the putative exon in macaque transcripts annotated by Ensembl. We found that at least 28.7% of the macaque transcripts were previously mis-annotated, mainly due to incorrect exon–intron boundaries, incomplete untranslated regions (UTRs) and missed exons. Compared with the previous gene models, the revised transcripts show clearer sequence motifs near splicing junctions and the end of UTRs, as well as cleaner patterns of exon–intron distribution for expression tags and cross-species conservation scores. Strikingly, 1292 exon–intron boundary revisions between coding exons corrected the previously mis-annotated open reading frames. The revised gene models were experimentally verified in randomly selected cases. We further integrated functional genomics annotations from >60 categories of public and in-house resources and developed an online accessible database. User-friendly interfaces were developed to update, retrieve, visualize and download the RhesusBase meta-data, providing a ‘one-stop’ resource for the monkey research community.

INTRODUCTION

As a non-human primate, the rhesus macaque has unique advantages in molecular and translational studies (1). On one hand, although rodents are widely used in molecular mechanism studies and drug preclinical evaluation, fundamental differences in genome sequence composition, expression regulations, pharmacokinetics and behavior have been demonstrated between human and these small-animal models (1). The extension of molecular mechanism from rodents to humans should be considered with care in regard to diseases and drug development (1). On the other hand, experimental models of human behaviors and diseases are limited, due to environmental factors such as differences in diet or drug use, that contribute substantially to their pathogenesis (2) and leading to controversial findings (3). Subsequent studies of mechanisms are also hampered by difficulties in patient sample collection. In contrast, the rhesus macaque has advantages as a central model animal (4). Especially, as a species closely related to human, the genome sequence composition and expression regulation are more similar (1,5), making it a unique model for studying the physiological and pathological features of disease, identifying the causal genetic relationships between genotypes and phenotypes, underpinning the molecular mechanisms underlying complex diseases, and assessing the effectiveness and side effects of new drugs.

Although the rhesus macaque has unique advantages, its current use in biomedical research is still limited, partly due to error-prone gene structures and limited functional genomics annotations. After the first declaration of the rhesus macaque genome in 2007 (1), functional genomics data started to accumulate, but the available annotations are still scarce. One example is the transcriptional expression data traditionally used in transcript structure definition: according to the latest statistics from the National Center for Biotechnology Information (6), only 60 267 Expressed Sequence Tags (ESTs) have been reported in rhesus macaque, two orders of magnitude fewer than in the human (Build 37.3, 8 315 296 ESTs). For the majority of genes in the rhesus macaque, the transcript structure thus mainly relies on ab initio or comparative genomics-guided predictions, with only ∼1% supported by real mRNA and EST data according to recent RefSeq statistics (7,8). The transcript structures in 28.7% of rhesus macaque genes have been mis-annotated by the current annotation system as demonstrated by the current study, posing a major challenge in the monkey research community.

Even the limited annotations for rhesus macaque are widely scattered in the literature or in specialized databases without systematic integration. One example is for single nucleotide polymorphism (SNP) data: although at least four databases, dbSNP (8), MamuSNP (9), MonkeySNP (10) and CMSNP (11), have been developed to integrate monkey genotyping data, a standardized data structure or quality control mechanism is still lacking to efficiently manage the meta-data generated by different methodologies. Another example is for monkey transcription expression profiles identified by next-generation sequencing technology (12). Although such studies have been carried out in multiple monkey tissues with limited tissue selections and sequencing depth (5,13–17), it is not straightforward for biologists to take full advantages of the RNA-Seq data on accurate expression quantification and de novo splicing structure definition (12). A comprehensive platform is thus urgently needed in the community to effectively integrate and visualize such high-throughput data.

Overall, it is important to study novel gene functions and disease mechanisms in the framework of a well-annotated genomic context, which can provide state-of-the-art insights from the perspective of comparative genomics, gene regulation, expression patterns and evolutionary clues. Currently, ‘FlyBase’ (18), ‘WormBase’ (19) and Mouse Genome Informatics (20) have been established, which greatly enhance the international study of fruit flies, nematodes and mice. Here, we present the first comprehensive ‘RhesusBase’ effort in the rhesus monkey, to refine genome-wide gene structures, to integrate >60 categories of public and in-house functional annotations, and to develop the first user-friendly knowledgebase platform, providing a ‘one-stop’ resource for the monkey research community.

MATERIALS AND METHODS

Ethics statement

Rhesus monkeys tissues were obtained from the Institute of Molecular Medicine in Peking University, which has an animal facility internationally accredited by the Association for Assessment and Accreditation of Laboratory Animal Care (AAALAC). This study was approved by the Institutional Animal Care and Use Committee of Peking University. All animals were handled in strict accordance with good animal practice as defined by the relevant national and local animal welfare bodies.

Computational processing of strand-specific poly (A)-positive RNA-Seq data

Total RNA was extracted from 10 rhesus monkey tissues using the Trizol method and analysed by an Agilent 2100 bio-analyzer (Agilent Technologies). The strand-specific Poly (A)-positive RNA-Seq study was performed on 10 rhesus macaque tissues, with the Illumina HiSeq2000 platform running 90 cycles with paired-end design according to the manufacturer’s instructions. In-house paired-end mRNA sequence tags were mapped to the rhesus monkey genome (rheMac2) by BWA (v0.5.9) (21) and TopHat (v1.2.0) (22). Multiple alignment reads were discarded. A series of Perl (v5.12.2) and R (v2.13.1) scripts were implemented to process and evaluate the quality of the RNA-Seq data, and calculate the statistics of genes, transcripts, exons and splicing junctions (Table 1).

Table 1.

Statistics of RNA-Seq coverage on fine-scale monkey transcript structure

Categories	Total^a	Covered^b	Percentage
Exons	360 789	351 311	97.4
Junctions	317 969	273 967	86.2
Transcripts	42 820	33 914	79.2

^aNumber of exons, junctions or transcripts on the basis of Ensembl gene models.

^bNumber of exons, junctions or transcripts covered by expression tags.

Open in new tab

Table 1.

Statistics of RNA-Seq coverage on fine-scale monkey transcript structure

Categories	Total^a	Covered^b	Percentage
Exons	360 789	351 311	97.4
Junctions	317 969	273 967	86.2
Transcripts	42 820	33 914	79.2

^aNumber of exons, junctions or transcripts on the basis of Ensembl gene models.

^bNumber of exons, junctions or transcripts covered by expression tags.

Open in new tab

Genome-wide refinement of monkey gene structures

The fine-scale structures in monkey transcripts were revised on the basis of the RNA-Seq data. First, an exon/intron boundary was revised when (i) the new splicing model was supported by at least two expression tags across the splicing junction, while no tag supporting the previous splicing model; (ii) the expression tags supporting both the donor and acceptor sites and the splicing junctions were marked with GT–AG, GC–AG and AT–AC (23); and (iii) the revised splicing junction was located within the start site of the leading exon and the end site of the followed exon, creating revised exons with no shorter than 80% and no longer than 120% of the length for previously defined exons by Ensembl. Second, on the basis of the distribution of mRNA expression tags on the genome, we extended the 5′- and 3′-UTRs of the previous gene model to a new stop site, where (i) the base-level coverage of the expression tags was <15 in at least two samples; and (ii) when combining its upstream sites with identical tag coverage and the following sites with identical tag coverage, the average base-level coverage of the expression tags is <15 in each sample. Revisions with <100-bp extension were not included. Third, we identified potential new exons missed by the current annotation using Cufflink (v 0.9.3) with parameters -o -F 0.4 -j 0.45 -m 220 -p 4 (24). An exon was defined when (i) it was supported by continuous expression tags and defined by Cufflink as an intact exon; (ii) it was located in a previously annotated transcript; (iii) for both ends of the new exon, at least two expression tags linked it to known gene model; and (iv) the overlap between the new exon and all other annotated exons was <30%. Finally, we also identified 8057 brand-new transcripts using similar approach. A new transcript had at least two intact exons connected by splicing junctions, supported by at least two expression tags. Moreover, the whole transcript was located in intergenic regions as defined by the current Ensembl annotation. New transcripts were clustered following the Genome-based UniGene Build Procedure (6). A series of Perl (v5.12.2) scripts were implemented to refine the fine-scale transcript structures (Table 2; Supplementary Figures S1–S3 and Supplementary Table S1).

Table 2.

28.7% Ensembl macaque transcripts were convincingly refined

Categories	Events	Transcripts	Percentage^a
Junctions	4 054	2 947	6.9
5′UTRs	2 267	2 267	5.3
3′UTRs	7 917	7 917	18.5
New exons	2 427	1 602	3.7
Total	16 665	12 303^b	28.7

Categories	Events	Transcripts	Percentage^a
Junctions	4 054	2 947	6.9
5′UTRs	2 267	2 267	5.3
3′UTRs	7 917	7 917	18.5
New exons	2 427	1 602	3.7
Total	16 665	12 303^b	28.7

^aPercentage of revised Ensembl transcripts.

^bNumber of transcripts involved in four types of refinements. Transcripts with two or more revisions were counted once.

Open in new tab

Table 2.

28.7% Ensembl macaque transcripts were convincingly refined

Categories	Events	Transcripts	Percentage^a
Junctions	4 054	2 947	6.9
5′UTRs	2 267	2 267	5.3
3′UTRs	7 917	7 917	18.5
New exons	2 427	1 602	3.7
Total	16 665	12 303^b	28.7

Categories	Events	Transcripts	Percentage^a
Junctions	4 054	2 947	6.9
5′UTRs	2 267	2 267	5.3
3′UTRs	7 917	7 917	18.5
New exons	2 427	1 602	3.7
Total	16 665	12 303^b	28.7

^aPercentage of revised Ensembl transcripts.

^bNumber of transcripts involved in four types of refinements. Transcripts with two or more revisions were counted once.

Open in new tab

Evaluation of transcript structure refinement

Overall, we evaluated the three types (Figures 1–3) of refinements of transcript structures from the perspective of the distributions of the RNA-Seq expression tags (Figures 1A, 3A and B), distributions of the cross-species conservation scores (Figures 1B and 3C), as well as the sequence motif flanking the splicing junctions (Figures 1C and 3D) and the 5′- or 3′-end of the revised transcripts (Figures 2A–C). First, a series of Perl (v5.12.2) scripts were implemented to evaluate and visualize the distributions of the RNA-Seq expression tags. Then, we calculated cross-species conservation scores according to the previously reported pipeline (Supplementary Figure S4) (7). Finally, we calculated and visualized the sequence motifs flanking the donor/acceptor splice sites using WebLogo (v3.2). The ChIP-Seq dataset on histone H3 lysine 4 trimethylation (H3K4me3) was downloaded and processed on the basis of the previously reported pipeline in the original papers, which was further used to evaluate the completeness of 5′-UTRs. All statistical analyses were performed using R packages (v2.13.1).

Figure 1.

Open in new tab Download slide

Evaluation of refined exon/intron boundaries. (A) Normalized mRNA-Seq expression tag coverage for each refined splicing junction in different categories. Exon: exonic regions defined by both gene models; Intron: intronic regions defined by both gene models; RhesusBase Exon: exonic regions defined by revised gene models, while intronic regions by previous gene models; RhesusBase Intron: intronic regions defined by revised gene models, while exonic regions by previous gene models; (B) Intron-exon distributions of cross-species conservation score. Reference: splicing junction supported by both gene models; Ensembl: splicing junction defined by Ensembl; RhesusBase: refined splicing junction in this study. (C) Sequence motifs flanking the splicing junctions calculated on the basis of previous gene models (Ensembl) and revised gene models (RhesusBase). Reference: distribution calculated using 242 603 splicing junctions supported by both gene models with at least two independent expression tags across the splicing junction; Ensembl/RhesusBase: distributions calculated using 1793 acceptor sites and 2261 donor sites on the basis of previous gene models and revised gene models. (D) One example of a revised transcript. Both the previous gene models (Ensembl) and the revised gene models (RhesusBase) are shown. RNA-Seq expression tag coverage and splicing junctions indicated by expression tags across junctions, cross-species conservation score, as well as sequenced cDNA fragments are aligned accordingly. Strand information is indicated by arrows on transcripts and exon boundaries are indicated by vertical dashed lines. The sequence surrounding the splicing junction is indicated, in which GT–AG or GC–AG sites are highlighted in red.

Figure 2.

Open in new tab Download slide

Evaluation of extended 5′- or 3′-UTRs. (A) Frequencies of AAUAAA hexamer near the end of the 3′-UTRs, on the basis of previous gene models (Ensembl) and the revised gene models (RhesusBase). Negative controls were generated using flanking regions near the start site of these transcripts (Negative Controls). (B) Frequencies of AAUAAA hexamer near the end of the 3′-UTRs, for transcript annotations in human and Ensembl annotations in rhesus macaque. (C) Distribution of the transcription start sites identified by ChIP-Seq study, on the basis of the previous and revised gene models. Reference: the end of the 5′-UTR supported by both previous and new models; (D and E) Gene structures of two experimentally verified transcripts revised by RhesusBase. Both the previous gene models (Ensembl) and the revised gene models (RhesusBase) are shown. RNA-Seq expression tag coverage, splicing junctions, cross-species conservation score, as well as sequenced cDNA fragments were aligned accordingly. AATAAA site (D) or transcription start site (E) identified by ChIP-Seq study are highlighted. The RNA-Seq expression tag coverage was set to the maximal score for sites with high tag coverage (>100).

Figure 3.

Open in new tab Download slide

Evaluation of new exons and transcripts absent in Ensembl annotation. (A, B) Normalized mRNA-Seq expression tag coverage in exonic regions, upstream and downstream intronic regions, for revisions adding missed exons (A) or transcripts (B). (C) Intron–exon distributions of cross-species conservation score. Reference: exons in rhesus macaque supported by both gene models; New Exon: missed exons on the basis of Ensembl annotation; New Transcript: exons in new transcripts identified in this study. (D) Sequence motifs flanking the splicing junctions for new exons and transcripts. Distributions were calculated using 2 427 new exons (New Exons) and 24 295 exons in 8057 new transcripts (New transcripts). (E and F) Two examples are shown for the fine-scale structure of new exons missed by Ensembl (E) and new transcripts (F). Both the previous gene models (Ensembl) and the revised gene models (RhesusBase) are shown. RNA-Seq expression tag coverage, splicing junctions, cross-species conservation score, and sequenced cDNA fragments were aligned accordingly. Sequences surrounding the splicing junctions are also illustrated, in which GT-AG sites are highlighted in red.

RNA isolation, cDNA synthesis, PCR and Sanger sequencing

The monkey tissue samples used were obtained from the Institute of Molecular Medicine in Peking University. RNA isolation, cDNA synthesis and sequencing were performed as described previously (25), using glyceraldehyde-3-phosphate dehydrogenase (Applied Biosystems) as an endogenous control. The PCR primers used in this study are listed in Supplementary Table S2.

Integration of functional genomics data in rhesus macaque

First, in-house functional genomics data in the rhesus macaque were processed and integrated according to standardized pipelines (26) (Figure 4). Second, through the PUBMED keywords query ‘(genome OR transcriptome OR proteome) AND (rhesus macaque)’, we accessed public functional genomics studies and re-analysed the raw data according to the pipelines reported in the original studies. We designed standardized criteria for meta-data extraction and storage (Supplementary Table S3). Detailed information such as sample information, types of experimental platforms and treatments, literature information, and genotype-phenotype correlation information were carefully curated and integrated (Figure 4 and Supplementary Table S3). Third, a series of bash and Perl scripts were implemented to download, manage and process the data from >60 currently available databases (Figure 4 and Table 3). For each site in the monkey genome, cross-species conservation score was also calculated and integrated (Supplementary Methods and Supplementary Figure S4). LiftOver (7) was introduced for data transformation and standardization. Overall, functional annotations from >60 categories of public and in-house resources were integrated, with >5 billion annotation entries (Figure 4 and Table 3).

Figure 4.

Open in new tab Download slide

RhesusBase data integration and abstraction. Nine functional categories of annotation were integrated and standardized: Gene Description, Gene/Transcript Structure, Expression Profile, Regulation Mode, Variation and Repeats, Comparative Genomics, Gene Function, Phenotype/Disease Association and Drug Development. Detailed descriptions of annotations in each functional category are illustrated. Annotations integrated from in-house datasets are shown in green boxes, those processed from public databases in blue boxes and those extracted directly from public databases in grey boxes. The total numbers of entries in each functional category are shown.

Table 3.

Statistics for RhesusBase functional genomics annotations

Categories	Resources	All entries (Rhesus)	All gene coverage (Rhesus)	References
Gene description
RhesusBase genes	This study	22 283^a (18 406^b)	22 283^c (18 406^d)	(7,34)
Validated genes	RefSeq	2 588 (2 588)	2 541 (2 541)	(35)
Putative genes	Ensembl, N-SCAN, SGP, Geneid, miRBase, GtRNAdb	127 271 (127 271)	31 416 (31 416)	(7,31,35–40)
Transcript structure
RhesusBase transcripts	This Study, Public Data	50 847 (50 847)	28 634 (28 634)	This study
RNA-Seq coverage	This Study, Public Data	537 867 932 (537 867 932)	16 462 (16 462)	This study, (5,13–17)
Splicing junctions	This Study, Public Data	1 380 988 (1 380 988)	16 992 (16 992)	This study, (5,13–17)
Expressed sequence tags	GenBank, dbEST, UCSC	72 657 (72 657)	8 832 (8 832)	(6,7,41)
Transcript sequences	RefSeq	32 685 (32 685)	17 575 (17 575)	(35)
Expression profile
RNA expression identified by RNA-Seq	This Study, Public Data	1 332 656 (982 226)	22 198 (16 809)	This study, (5,13–17,32)
RNA expression identified by in situ hybridization	Alan Brain Atlas	12 397 (0)	9 218 (0)	(42)
RNA expression identified by cDNA microarray	BioGPS, Alan Brain Atlas	48 161 (0)	20 795 (0)	(42,43)
Regulation Mode
Transcriptional regulation	UCSC, Public Data	235 086 (235 086)	11 601 (0)	(7,15,44,45)
Posttranscriptional regulation	This Study, Argonaute, TarBase, PicTar, TargetScan, miRanda	82 355 (82 355)	1 625 (1 520)	(46–51)
Natural-antisense regulation	NATsDB, TransMap	37 868 (0)	5 463 (5 463)	(52,53)
Posttranslational modification	dbPTM	4 390 (4 390)	223 (0)	(54)
Variation and repeats
Single nucleotide variation	This Study, dbSNP, CMSNP, MamuSNP, MonkeySNP	5 682 738 (5 500 294)	17 430 (15 743)	(9–11,55)
Copy number variation	dbVar, DGV	29 593 (337)	6 068 (104)	(8,56)
Genomic repeats	UCSC	5 291 149 (5 291 149)	15 445 (15 445)	(7,57)
Comparative genomics
Rhesus-centric pairwise alignments	UCSC	32 487 843 (32 487 843)	17 603 (17 603)	(7)
Cross-species conservation score prediction	UCSC	4 998 806 214 (4 998 806 214)	16 435 (16 435)	This study, (7)
Gene function
Related publication	NCBI	544 499 (269)	171 (171)	(34)
Predicted protein domain	InterPro	28 517 (28 517)	8 399 (8 399)	(58)
Biological process, cellular component and molecular function	Gene Ontology	191 251 (0)	11 850 (0)	(59)
Molecular pathway	KEGG, Reactome, BioCarta, PID	12 346 (187)	4 106 (4 106)	(60–62)
Protein–Protein Interaction	IntAct, HPRD, DIP, BioGRID, BioCyc, STRING	819 029 (672 864)	10 606 (10 606)	(27,63–67)
Phenotype and disease association
Human inheritance disease	OMIM	9 935 (0)	6 104 (0)	(68)
Genetic susceptible gene (genome-wide association study)	NHGRI Catalog of Published Genome-Wide Association Studies	4 903 (0)	3 536 (0)	(69)
Genetic susceptible gene (low-scale association study)	GAD	44 201 (0)	3 535 (0)	(70)
Transgenic mouse phenotype	MGI, PBmice	32 080 (0)	5 420 (0)	(20,71)
Drug development
Pharmacogenomics	PharmGKB	21 072 (0)	19 495 (0)	(72)
Drug-induced differentially expressed genes	Connectivity MAP	2 354 610 (0)	9 125 (0)	(73)

Categories	Resources	All entries (Rhesus)	All gene coverage (Rhesus)	References
Gene description
RhesusBase genes	This study	22 283^a (18 406^b)	22 283^c (18 406^d)	(7,34)
Validated genes	RefSeq	2 588 (2 588)	2 541 (2 541)	(35)
Putative genes	Ensembl, N-SCAN, SGP, Geneid, miRBase, GtRNAdb	127 271 (127 271)	31 416 (31 416)	(7,31,35–40)
Transcript structure
RhesusBase transcripts	This Study, Public Data	50 847 (50 847)	28 634 (28 634)	This study
RNA-Seq coverage	This Study, Public Data	537 867 932 (537 867 932)	16 462 (16 462)	This study, (5,13–17)
Splicing junctions	This Study, Public Data	1 380 988 (1 380 988)	16 992 (16 992)	This study, (5,13–17)
Expressed sequence tags	GenBank, dbEST, UCSC	72 657 (72 657)	8 832 (8 832)	(6,7,41)
Transcript sequences	RefSeq	32 685 (32 685)	17 575 (17 575)	(35)
Expression profile
RNA expression identified by RNA-Seq	This Study, Public Data	1 332 656 (982 226)	22 198 (16 809)	This study, (5,13–17,32)
RNA expression identified by in situ hybridization	Alan Brain Atlas	12 397 (0)	9 218 (0)	(42)
RNA expression identified by cDNA microarray	BioGPS, Alan Brain Atlas	48 161 (0)	20 795 (0)	(42,43)
Regulation Mode
Transcriptional regulation	UCSC, Public Data	235 086 (235 086)	11 601 (0)	(7,15,44,45)
Posttranscriptional regulation	This Study, Argonaute, TarBase, PicTar, TargetScan, miRanda	82 355 (82 355)	1 625 (1 520)	(46–51)
Natural-antisense regulation	NATsDB, TransMap	37 868 (0)	5 463 (5 463)	(52,53)
Posttranslational modification	dbPTM	4 390 (4 390)	223 (0)	(54)
Variation and repeats
Single nucleotide variation	This Study, dbSNP, CMSNP, MamuSNP, MonkeySNP	5 682 738 (5 500 294)	17 430 (15 743)	(9–11,55)
Copy number variation	dbVar, DGV	29 593 (337)	6 068 (104)	(8,56)
Genomic repeats	UCSC	5 291 149 (5 291 149)	15 445 (15 445)	(7,57)
Comparative genomics
Rhesus-centric pairwise alignments	UCSC	32 487 843 (32 487 843)	17 603 (17 603)	(7)
Cross-species conservation score prediction	UCSC	4 998 806 214 (4 998 806 214)	16 435 (16 435)	This study, (7)
Gene function
Related publication	NCBI	544 499 (269)	171 (171)	(34)
Predicted protein domain	InterPro	28 517 (28 517)	8 399 (8 399)	(58)
Biological process, cellular component and molecular function	Gene Ontology	191 251 (0)	11 850 (0)	(59)
Molecular pathway	KEGG, Reactome, BioCarta, PID	12 346 (187)	4 106 (4 106)	(60–62)
Protein–Protein Interaction	IntAct, HPRD, DIP, BioGRID, BioCyc, STRING	819 029 (672 864)	10 606 (10 606)	(27,63–67)
Phenotype and disease association
Human inheritance disease	OMIM	9 935 (0)	6 104 (0)	(68)
Genetic susceptible gene (genome-wide association study)	NHGRI Catalog of Published Genome-Wide Association Studies	4 903 (0)	3 536 (0)	(69)
Genetic susceptible gene (low-scale association study)	GAD	44 201 (0)	3 535 (0)	(70)
Transgenic mouse phenotype	MGI, PBmice	32 080 (0)	5 420 (0)	(20,71)
Drug development
Pharmacogenomics	PharmGKB	21 072 (0)	19 495 (0)	(72)
Drug-induced differentially expressed genes	Connectivity MAP	2 354 610 (0)	9 125 (0)	(73)

^aTotal number of RhesusBase entries in rhesus macaque, human and mouse.

^bThe number of RhesusBase entries specifically for rhesus macaque.

^cThe number of monkey genes with RhesusBase annotations from rhesus macaque, human and mouse.

^dThe number of genes with RhesusBase annotations specifically from rhesus macaque.

Open in new tab

Table 3.

Statistics for RhesusBase functional genomics annotations

Categories	Resources	All entries (Rhesus)	All gene coverage (Rhesus)	References
Gene description
RhesusBase genes	This study	22 283^a (18 406^b)	22 283^c (18 406^d)	(7,34)
Validated genes	RefSeq	2 588 (2 588)	2 541 (2 541)	(35)
Putative genes	Ensembl, N-SCAN, SGP, Geneid, miRBase, GtRNAdb	127 271 (127 271)	31 416 (31 416)	(7,31,35–40)
Transcript structure
RhesusBase transcripts	This Study, Public Data	50 847 (50 847)	28 634 (28 634)	This study
RNA-Seq coverage	This Study, Public Data	537 867 932 (537 867 932)	16 462 (16 462)	This study, (5,13–17)
Splicing junctions	This Study, Public Data	1 380 988 (1 380 988)	16 992 (16 992)	This study, (5,13–17)
Expressed sequence tags	GenBank, dbEST, UCSC	72 657 (72 657)	8 832 (8 832)	(6,7,41)
Transcript sequences	RefSeq	32 685 (32 685)	17 575 (17 575)	(35)
Expression profile
RNA expression identified by RNA-Seq	This Study, Public Data	1 332 656 (982 226)	22 198 (16 809)	This study, (5,13–17,32)
RNA expression identified by in situ hybridization	Alan Brain Atlas	12 397 (0)	9 218 (0)	(42)
RNA expression identified by cDNA microarray	BioGPS, Alan Brain Atlas	48 161 (0)	20 795 (0)	(42,43)
Regulation Mode
Transcriptional regulation	UCSC, Public Data	235 086 (235 086)	11 601 (0)	(7,15,44,45)
Posttranscriptional regulation	This Study, Argonaute, TarBase, PicTar, TargetScan, miRanda	82 355 (82 355)	1 625 (1 520)	(46–51)
Natural-antisense regulation	NATsDB, TransMap	37 868 (0)	5 463 (5 463)	(52,53)
Posttranslational modification	dbPTM	4 390 (4 390)	223 (0)	(54)
Variation and repeats
Single nucleotide variation	This Study, dbSNP, CMSNP, MamuSNP, MonkeySNP	5 682 738 (5 500 294)	17 430 (15 743)	(9–11,55)
Copy number variation	dbVar, DGV	29 593 (337)	6 068 (104)	(8,56)
Genomic repeats	UCSC	5 291 149 (5 291 149)	15 445 (15 445)	(7,57)
Comparative genomics
Rhesus-centric pairwise alignments	UCSC	32 487 843 (32 487 843)	17 603 (17 603)	(7)
Cross-species conservation score prediction	UCSC	4 998 806 214 (4 998 806 214)	16 435 (16 435)	This study, (7)
Gene function
Related publication	NCBI	544 499 (269)	171 (171)	(34)
Predicted protein domain	InterPro	28 517 (28 517)	8 399 (8 399)	(58)
Biological process, cellular component and molecular function	Gene Ontology	191 251 (0)	11 850 (0)	(59)
Molecular pathway	KEGG, Reactome, BioCarta, PID	12 346 (187)	4 106 (4 106)	(60–62)
Protein–Protein Interaction	IntAct, HPRD, DIP, BioGRID, BioCyc, STRING	819 029 (672 864)	10 606 (10 606)	(27,63–67)
Phenotype and disease association
Human inheritance disease	OMIM	9 935 (0)	6 104 (0)	(68)
Genetic susceptible gene (genome-wide association study)	NHGRI Catalog of Published Genome-Wide Association Studies	4 903 (0)	3 536 (0)	(69)
Genetic susceptible gene (low-scale association study)	GAD	44 201 (0)	3 535 (0)	(70)
Transgenic mouse phenotype	MGI, PBmice	32 080 (0)	5 420 (0)	(20,71)
Drug development
Pharmacogenomics	PharmGKB	21 072 (0)	19 495 (0)	(72)
Drug-induced differentially expressed genes	Connectivity MAP	2 354 610 (0)	9 125 (0)	(73)

Categories	Resources	All entries (Rhesus)	All gene coverage (Rhesus)	References
Gene description
RhesusBase genes	This study	22 283^a (18 406^b)	22 283^c (18 406^d)	(7,34)
Validated genes	RefSeq	2 588 (2 588)	2 541 (2 541)	(35)
Putative genes	Ensembl, N-SCAN, SGP, Geneid, miRBase, GtRNAdb	127 271 (127 271)	31 416 (31 416)	(7,31,35–40)
Transcript structure
RhesusBase transcripts	This Study, Public Data	50 847 (50 847)	28 634 (28 634)	This study
RNA-Seq coverage	This Study, Public Data	537 867 932 (537 867 932)	16 462 (16 462)	This study, (5,13–17)
Splicing junctions	This Study, Public Data	1 380 988 (1 380 988)	16 992 (16 992)	This study, (5,13–17)
Expressed sequence tags	GenBank, dbEST, UCSC	72 657 (72 657)	8 832 (8 832)	(6,7,41)
Transcript sequences	RefSeq	32 685 (32 685)	17 575 (17 575)	(35)
Expression profile
RNA expression identified by RNA-Seq	This Study, Public Data	1 332 656 (982 226)	22 198 (16 809)	This study, (5,13–17,32)
RNA expression identified by in situ hybridization	Alan Brain Atlas	12 397 (0)	9 218 (0)	(42)
RNA expression identified by cDNA microarray	BioGPS, Alan Brain Atlas	48 161 (0)	20 795 (0)	(42,43)
Regulation Mode
Transcriptional regulation	UCSC, Public Data	235 086 (235 086)	11 601 (0)	(7,15,44,45)
Posttranscriptional regulation	This Study, Argonaute, TarBase, PicTar, TargetScan, miRanda	82 355 (82 355)	1 625 (1 520)	(46–51)
Natural-antisense regulation	NATsDB, TransMap	37 868 (0)	5 463 (5 463)	(52,53)
Posttranslational modification	dbPTM	4 390 (4 390)	223 (0)	(54)
Variation and repeats
Single nucleotide variation	This Study, dbSNP, CMSNP, MamuSNP, MonkeySNP	5 682 738 (5 500 294)	17 430 (15 743)	(9–11,55)
Copy number variation	dbVar, DGV	29 593 (337)	6 068 (104)	(8,56)
Genomic repeats	UCSC	5 291 149 (5 291 149)	15 445 (15 445)	(7,57)
Comparative genomics
Rhesus-centric pairwise alignments	UCSC	32 487 843 (32 487 843)	17 603 (17 603)	(7)
Cross-species conservation score prediction	UCSC	4 998 806 214 (4 998 806 214)	16 435 (16 435)	This study, (7)
Gene function
Related publication	NCBI	544 499 (269)	171 (171)	(34)
Predicted protein domain	InterPro	28 517 (28 517)	8 399 (8 399)	(58)
Biological process, cellular component and molecular function	Gene Ontology	191 251 (0)	11 850 (0)	(59)
Molecular pathway	KEGG, Reactome, BioCarta, PID	12 346 (187)	4 106 (4 106)	(60–62)
Protein–Protein Interaction	IntAct, HPRD, DIP, BioGRID, BioCyc, STRING	819 029 (672 864)	10 606 (10 606)	(27,63–67)
Phenotype and disease association
Human inheritance disease	OMIM	9 935 (0)	6 104 (0)	(68)
Genetic susceptible gene (genome-wide association study)	NHGRI Catalog of Published Genome-Wide Association Studies	4 903 (0)	3 536 (0)	(69)
Genetic susceptible gene (low-scale association study)	GAD	44 201 (0)	3 535 (0)	(70)
Transgenic mouse phenotype	MGI, PBmice	32 080 (0)	5 420 (0)	(20,71)
Drug development
Pharmacogenomics	PharmGKB	21 072 (0)	19 495 (0)	(72)
Drug-induced differentially expressed genes	Connectivity MAP	2 354 610 (0)	9 125 (0)	(73)

^aTotal number of RhesusBase entries in rhesus macaque, human and mouse.

^bThe number of RhesusBase entries specifically for rhesus macaque.

^cThe number of monkey genes with RhesusBase annotations from rhesus macaque, human and mouse.

^dThe number of genes with RhesusBase annotations specifically from rhesus macaque.

Open in new tab

Development of RhesusBase management system and interactive user interfaces

We developed a database, the RhesusBase, with MySQL relational schema to manage the meta-data. We also implemented highly interactive user interfaces to support the data storage, update, display, retrieve and download of the function annotations (Figure 5), using various web development technologies such as HTML, CSS, JavaScript (jQuery), AJAX (EXTJS), Java and JSP. Apache was used as the web server, with Tomcat as the JSP parser. A genome browser was developed on the basis of ABrowse (28). A Biomart-based download system (29) was also developed to facilitate the offline use of RhesusBase annotations. All annotations and database schema in RhesusBase are freely accessible at http://www.rhesusbase.org.

Figure 5.

Open in new tab Download slide

Overview of RhesusBase management system and interactive user interfaces. A comprehensive database management system and five highly interactive user interfaces were developed to support data storage, updating (A), retrieval (B), display (C, D) and downloading (E) in RhesusBase. A database update module was developed to facilitate the efficient updating of RhesusBase as more public or in-house functional data become available (A). Keywords, location and sequence-based query systems were developed to facilitate the retrieve of functional annotations from RhesusBase (B). Through this information retrieval system, users are referred to two different view modes to display the annotations, that of a gene-centric view (C) and a position-centric browser view (D). A Biomart-based download system was also developed for the offline use of RhesusBase annotations (E).

RESULTS

Correction of gene models in 28.7% Ensembl macaque transcripts

As noted earlier, for the majority of genes in the rhesus macaque, the transcript structures were putatively inferred due to scarce monkey mRNA and EST data. Recently developed deep sequencing technology made it possible to quickly generate expression tags in the rhesus macaque, whereas even when using RNA-Seq technology by selecting uniquely mapped expression tags on the genome (12), gene boundaries are difficult to determine considering the widely distributed cis-natural antisense events in primates (30). We thus performed a strand-specific RNA-Seq study in 10 rhesus monkey tissues from one individual to identify polyadenylated mRNAs. More than 1.2 billion 90-bp paired-end expression tags were generated and sequenced with high quality, in which 876 million tags were mapped uniquely to the rhesus monkey genome. Detailed descriptions of the data collection, expression tags mapping and RNA-Seq quality control were presented in (26).

Using the rhesus macaque genome and transcriptome annotations of Ensembl (31) as references, we assembled the fine-scale transcript structures on the basis of the distribution of mRNA expression tags on the genome, as well as the splicing sites indicated by expression tags across splicing junctions (Materials and Methods). Briefly, the expression of 351 311 (97.4%) putatively annotated exons by Ensembl were verified by RNA-Seq expression tags. In addition, 273 967 splicing junctions (86.2%) were covered by at least one RNA-Seq fragment (Table 1), supporting 250 733 (78.9%) Ensembl-annotated exon borders. These statistics indicate that the coverage of the RNA-Seq data we generated was deep enough to accurately evaluate the fine-scale transcript structure. In addition, the putative transcript structures by Ensembl annotation are largely convincing, partly due to the highly conserved transcript structures between rhesus macaque and other well-annotated genomes such as human and mouse.

However, we found that the fine-scale transcript structures in at least 28.7% of the Ensembl macaque transcripts were partially mis-annotated, mainly in three ways: mis-annotated exon/intron boundaries, incomplete 5′- or 3′-UTRs and missed constitutive exons or transcripts (Table 2). First, although most of the splicing junctions were verified, 4054 junctions in 2947 transcripts (6.9%) were mis-annotated, supported by at least two independent expression tags across the splicing junction (Table 2 and Supplementary Dataset S1). A total of 3 401 events occurred between coding exons, 1292 in which a frame-shift was introduced (Supplementary Dataset S1). Second, 5′- or 3′-UTRs in 2267 or 7917 transcripts (5.3 or 18.5%) were extended on the basis of the mRNA fragment distribution across the genome (Table 2 and Supplementary Datasets S2 and Supplementary Data). Third, 2427 new exons were identified in 1602 transcripts (3.7%), supported by convincing mRNA fragment clusters, which were further connected to known gene model by RNA-Seq expression tags across splicing junctions (Table 2 and Supplementary Dataset S4). Finally, we also identified 8057 new transcripts in the rhesus macaque genome. On the basis of the current gene annotation in rhesus macaque, these transcripts were located in intergenic regions, while the RNA-Seq data suggested convincing expression of these transcripts (Supplementary Data and Supplementary Dataset S5).

We refined 16 665 events in 12 303 Ensembl transcripts across the rhesus macaque genome. If looser criteria were used in processing RNA-Seq data, as many as 16 587 Ensembl transcripts (38.7%) were modified (Supplementary Methods and Supplementary Table S1). These revisions would contribute significantly to biochemical, molecular biological and genetics studies in the monkey research community.

The transcript structures in rhesus macaque were convincingly refined

We evaluated the three types of refinements on transcript structures in the rhesus macaque, as well as new transcripts identified. First, we evaluated the 4054 refined exon/intron boundaries from the perspective of the exon–intron distributions of the RNA-Seq expression tags, distributions of the cross-species conservation scores, and the sequence motif flanking the splicing sites. In a typical mRNA-Seq assay, the distribution of expression tags should highly enrich in exonic compared with intronic regions (32). In addition, the cross-species conservation scores in exonic regions should be higher than in intronic regions due to purifying selection (7). As expected, the coverage of expression tags in exon regions was markedly higher than that in intronic regions (Figure 1A, Mann–Whitney test, P value < 2.2e−16) on the basis of the revised gene models, instead of the previous models (Figure 1A). In addition, the distribution of cross-species conservation score between exons and introns were consistent with new gene models (Figure 1B, Mann–Whitney test, P value < 2.2e−16), instead of the previous one (Figure 1B). Especially, compared with the previous gene models, clear sequence motifs were detected flanking the revised splicing junctions (Figure 1C), consistent with the motifs generated by well-accepted splicing sites in rhesus macaque as positive controls (Figure 1C), or those reported in previous studies in human (33). These items of evidence suggest that the refinements on exon/intron boundaries are largely convincing. One example of a revised transcript is shown, validated experimentally by mRNA reverse transcription polymerase chain reaction followed by cDNA sequencing (Figure 1D and Supplementary Table S2).

Among the 4054 events for exon/intron boundary revision, 1292 occurred between coding exons and introduced frame-shift on previously annotated open reading frames. Surprisingly, on the basis of the previous gene models, most of these transcripts had intact open reading frames, encoding proteins with clear homology in human (BLASTP E value < 10e−5). Strikingly, in 1095 (84.8%) of these events, another nearby unusual annotation, such as putative indel, putative exon and mis-annotated exon boundary, was detected on the basis of Ensembl annotation (Supplementary Figures S1 and Supplementary Data). These unusual annotations are unlikely to be true from the perspective of the RNA-Seq expression tags distribution, cross-species conservation score distribution, and the sequence motif flanking the splicing sites (Supplementary Figure S1). These double mistakes on the transcript structure rescued the open reading frames and created largely intact ORFs by current Ensembl annotations (Supplementary Figures S1 and Supplementary Data). For these transcripts annotated with double mistakes by Ensembl, the revised gene models were experimentally verified in four randomly selected cases (Supplementary Figure S2 and Supplementary Table S2). This systematic error in automatic gene structure annotation is cautious, especially in genetics studies using rhesus macaque as model animals.

Then, for the 10 184 events to extend 5′- or 3′-UTRs of transcripts, the exon–intron distribution patterns of both the RNA-Seq expression tags and cross-species conservation score support the modified gene models (Supplementary Figure S3, Mann–Whitney test, P value < 2.2e−16). Especially, an enriched AAUAAA hexamer of the poly(A) signal was detected near the end of the revised 3′-UTRs, compared with negative controls generated using flanking regions near the start site of these transcripts (Figure 2A). Weaker enrichment of AAUAAA was detected based on the previous gene model, indicating the combination of mis-annotated transcript structure and alternative 3′-UTR splicing on these transcripts by the current annotations (Figure 2A). Actually, these mis-annotated transcript structures partly contributed to the genome-wide shift of AAUAAA distribution in rhesus macaque to the downstream region of the transcript, compared with the human genome (Figure 2B). To evaluate the completeness of 5′-UTRs, we further integrated a recent ChIP-Seq dataset to identify histone H3 lysine 4 trimethylation (H3K4me3) sites in rhesus macaque (15), indicators of transcription start sites. For the transcripts with revised 5′-UTRs, the distribution of the H3K4me3 sites around the previously defined transcription start sites differed from the reference, calculated using genes with un-modified gene models, while the distribution using the refined gene models was consistent (Figure 2C). The gene structures in two revised transcripts (one for 3′-UTR and another for 5′-UTR revision) are shown, experimentally verified by Sanger sequencing of cDNAs extracted from the corresponding monkey tissues (Figure 2D and E and Supplementary Table S2).

Similar evaluations were performed on 2427 new exons as well as 8057 brand-new transcripts absent from the current Ensembl annotation (Figure 3). Both the new exons and transcripts were convincing from the perspective of RNA-Seq expression tag coverage (Figure 3A and B), cross-species conservation score (Figure 3C) and sequence motifs near the splicing junctions (Figure 3D), indicating accurate refinements on the gene structures. Two experimentally verified genes, one for a transcript with a missed exon and another for a brand-new transcript, are shown as demonstration cases for this type of revision (Figure 3E and F and Supplementary Table S2).

Overall, the fine-scale transcript structures in at least 28.7% of the monkey Ensembl transcripts were convincingly refined in this study, posing a good supplement to the current Ensembl annotations on gene and transcript structures in the rhesus macaque.

Comprehensive integration of functional genomics data in rhesus macaque

In the framework of well-defined gene structures, we further integrated in-house generated functional genomics data, as well as public available data scattered in the literature and specialized databases, to develop a well-annotated genomic context in the rhesus macaque (Figure 4 and Table 3). Briefly, three types of data resources were considered and integrated: First of all, as a primate center with international AAALAC standards, we generated masses of functional genomics data in the rhesus macaque especially using the deep sequencing technology. These in-house data were processed and integrated with standardized pipelines (Figure 4 and Table 3). Second, through the PUBMED keywords query, we accessed all functional genomics studies in the rhesus macaque, such as high-throughput annotations on gene expression profiles, transcription factors and microRNA binding sites generated by deep sequencing-based RNA-Seq, ChIP-Seq and CLIP-Seq technology. We re-analysed the raw data and designed standardized criteria for meta-data extraction and storage. Detailed meta-data such as sample information, types of experimental platforms and treatments, literature information and genotype-phenotype correlation information were carefully curated and integrated (Figure 4 and Table 3). Third, information in >60 currently available databases was curated and integrated to annotate the rhesus macaque genome from multiple perspectives (Table 3). Overall, for each gene in the rhesus macaque, functional annotations were integrated from nine functional categories: gene descriptions, genetic variations and repeats, gene and transcript structure, regulation mode, expression profile, gene function (including biological processes and pathways), and comparative genomics as well as disease association and drug development (Table 3 and Figure 4).

To maximize the utility of the functional annotation system, for each gene in the rhesus macaque, we also integrated all related annotations in human and mouse, as references to fully understand the monkey genome (Table 3 and Figure 4). In addition, for each site in the monkey genome, we calculated cross-species conservation scores to facilitate rhesus macaque-centric comparative genomics studies (Figures 4 and 5A). Overall, functional annotations from >60 categories of public and in-house resources were integrated, with >5 billion annotation entries (Figure 4).

RhesusBase: a ‘one-stop’ resource for the monkey research community

We developed RhesusBase with a comprehensive database management system and highly interactive user interfaces, to support the data storage, update, display, retrieval and download of the described functional annotations in the rhesus macaque (Figure 5). First, keywords, location and sequence-based query systems were developed to facilitate the retrieve of functional annotations in RhesusBase (Figure 5B). Through this user-friendly information retrieval system, users are referred to two different view modes for the annotations, a gene-centric view and a position-centric browser view, depending on their retrieval options. In the gene-centric view (Figure 5C), each gene in the rhesus macaque was assigned one page, in which detailed annotations were arranged and visualized in different functional categories, such as genes and transcript structure, expression, regulation, variation and repeats, phenotypes and disease, function, drug design and comparative genomics (Figure 5C). For each gene, functional annotations in human and mouse orthologs were also integrated to facilitate functional studies in the rhesus macaque. In position-centric view (Figure 5D), a genome browser was developed on the basis of ABrowse (28). More than 110 functional tracks were added onto the corresponding genomic context, illustrating refined gene and transcript structures, mRNA and EST data, RNA-Seq expression tag coverage and splicing junctions, transcription regulations, comparative genomics, variation and repeats, as well as phenotype and disease associations (Figure 5D). A Biomart-based download system (29) was also developed to facilitate the offline use of RhesusBase annotations (Figure 5E). Considering the significant role of guanosine-binding protein coupled receptor (GPCR) in drug development, we also developed an interface for 857 GPCR genes (GPCR Gateway) to facilitate the translational study of human diseases. The RhesusBase is freely accessible at http://www.rhesusbase.org, providing a ‘one-stop’ resource to facilitate molecular and translational research in the community.

DISCUSSION

Currently, functional genomic data on the rhesus macaque are scarce. The majority of gene and transcript structures were putatively predicted on the basis of other well-annotated genomes, with only ∼1% supported by real mRNA or EST data. These ab initio or comparative genomics-guided predictions are largely convincing, partly due to the highly conserved transcript structures between rhesus macaque and other well-annotated genomes such as human. Actually, on the basis of the putative gene models in Ensembl (31), most transcripts encode intact open reading frames, widely used in genetics and molecular evolution studies.

Based on our strand-specific RNA-Seq data, we demonstrated that the transcript structures in 28.7% of monkey genes were partially mis-annotated. Strikingly, 1292 revisions introduced a frame-shift on previously annotated open reading frames (Figure 1). Why were these serious flaws not detectable by previous computational pipelines on the basis of a prior comparative genomics knowledge and why could those putative transcripts with clear frame-shift mistakes still encode intact proteins? We noted that in many cases of our revisions located on chromosome regions with atypical regulatory patterns, e.g. besides standard GT–AG splicing sites, many splicing junctions use a GC–AG splicing junction, a pattern potentially neglected by a prior predictors (Figure 1C). In addition, many new exons and new transcripts showed a significantly lower cross-species conservation score, another atypical pattern potentially introducing errors in computational predictions (Figure 3C). More significantly, we noted that for 84.8% of the 1292 CDS boundary revision events introducing frame-shift on a previously annotated open reading frame, another nearby mistake was detected. These double mistakes created largely intact ORF by current Ensembl annotation, a strategy to make globally optimized protein structures (Supplementary Figures S1 and Supplementary Data). These predictions are largely acceptable in cases studying global patterns for monkey proteomes, but error-prone in fine-scale studies such as genetics studies, in which a single mistake on an exon–intron boundary could contribute to false-positive findings. Here, for the first time, we performed genome-wide gene structure refinement on the basis of real expression data in the rhesus macaque, which will greatly facilitate fine-scale studies in the monkey research community.

It is important to study gene functions and disease mechanisms in the framework of well-annotated genomic contexts. Although national-level annotation systems such as Ensembl for the rhesus macaque (31), UCSC Genome Browser (7) and NCBI Entrez System (34) have developed web servers to visualize monkey data, the annotations are widely scattered and putative. More recently, some monkey-oriented secondary databases have been developed, but they focus on highly specialized topics, typically for the presentation of in-house SNP data (9–11). It is also difficult for biologists to take full advantage of high-throughput data (such as RNA-Seq data). A comprehensive database of the rhesus macaque is thus urgently needed to support the monkey research community, just as ‘FlyBase’ (18), ‘WormBase’ (19), the Mouse Genome Informatics (20) do for the international fruit fly, nematode and mouse research communities. Here, we present the first comprehensive ‘RhesusBase’ effort for the monkey research community. Overall, functional annotations from >60 categories of public and in-house resources were integrated, with >5 billion annotation entries, which will substantially facilitate functional and translational studies in this field.

In a primate center built according to AAALAC standards, we have successfully developed rhesus macaque models of different complex diseases (74) and started to perform genomic biomedical studies using deep-sequencing technology (26). We will continue to update RhesusBase and release the latest annotation version every year through the web server, as more public or in-house functional data become available. RhesusBase is thus a dynamic approach to provide a ‘one-stop’ resource for the monkey research community.

ACCESSION NUMBERS

JK840892, JK840893, JK840894, JK840895, JK840896, JK840897, JK840898, JK840899, JK840900.

AUTHOR CONTRIBUTIONS

C.Y.L. conceived the idea. C.Y.L., R.X. and X.Z. designed the study. S.J.Z., C.J.L. and M.S. performed most of the experiments. L.K., J.Y.C., W.Z.Z., X.Z., P.Y., J.W., X.Y., N.H., Z.Y. and R.L.Z. performed part of the experiments. S.J.Z., C.J.L. and M.S. analysed the data and performed the statistical analysis. C.Y.L. wrote the manuscript. All authors read and approved the final manuscript.

FUNDING

The National Natural Science Foundation of China [31171269]; the National Basic Research Program of China [2011CB518000]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Funding for open access charge: The National Natural Science Foundation of China [31171269].

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

The authors thank Drs. Heping Cheng and Liping Wei at Peking University, Dr. Yong E. Zhang at the Chinese Academy Of Sciences for insightful suggestions for RhesusBase. They acknowledge Hui Wang, Wen Zheng, Bao Hai and Haitao Yang for assistance in RhesusBase development and Dr. Iain C. Bruce for manuscript revision.

REFERENCES

1

Gibbs

RA

,

Rogers

J

,

Katze

MG

,

Bumgarner

R

,

Weinstock

GM

,

Mardis

ER

,

Remington

KA

,

Strausberg

RL

,

Venter

JC

,

Wilson

RK

, et al.

Evolutionary and biomedical insights from the rhesus macaque genome

,

Science

,

2007

, vol.

316

(pg.

222

-

234

)

2

Mastin

JP

.

Environmental cardiovascular disease

,

Cardiovasc. Toxicol.

,

2005

, vol.

5

(pg.

91

-

94

)

3

Cirulli

ET

,

Goldstein

DB

.

Uncovering the roles of rare variants in common disease through whole-genome sequencing

,

Nat. Rev. Genet.

,

2010

, vol.

11

(pg.

415

-

425

)

4

Tung

J

,

Alberts

SC

,

Wray

GA

.

Evolutionary genetics in wild primates: combining genetic approaches with field studies of natural populations

,

Trends Genet.

,

2010

, vol.

26

(pg.

353

-

362

)

5

Blekhman

R

,

Marioni

JC

,

Zumbo

P

,

Stephens

M

,

Gilad

Y

.

Sex-specific and lineage-specific alternative splicing in primates

,

Genome Res.

,

2010

, vol.

20

(pg.

180

-

189

)

6

Sayers

EW

,

Barrett

T

,

Benson

DA

,

Bolton

E

,

Bryant

SH

,

Canese

K

,

Chetvernin

V

,

Church

DM

,

Dicuccio

M

,

Federhen

S

, et al.

Database resources of the National Center for Biotechnology Information

,

Nucleic Acids Res.

,

2012

, vol.

40

(pg.

D13

-

D25

)

7

Fujita

PA

,

Rhead

B

,

Zweig

AS

,

Hinrichs

AS

,

Karolchik

D

,

Cline

MS

,

Goldman

M

,

Barber

GP

,

Clawson

H

,

Coelho

A

, et al.

The UCSC Genome Browser database: update 2011

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D876

-

D882

)

8

Sayers

EW

,

Barrett

T

,

Benson

DA

,

Bolton

E

,

Bryant

SH

,

Canese

K

,

Chetvernin

V

,

Church

DM

,

DiCuccio

M

,

Federhen

S

, et al.

Database resources of the National Center for Biotechnology Information

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D38

-

D51

)

9

Malhi

RS

,

Sickler

B

,

Lin

D

,

Satkoski

J

,

Tito

RY

,

George

D

,

Kanthaswamy

S

,

Smith

DG

.

MamuSNP: a resource for Rhesus Macaque (Macaca mulatta) genomics

,

PloS One.

,

2007

, vol.

2

pg.

e438

10

Khouangsathiene

S

,

Pearson

C

,

Street

S

,

Ferguson

B

,

Dubay

C

.

MonkeySNP: a web portal for non-human primate single nucleotide polymorphisms

,

Bioinformatics

,

2008

, vol.

24

(pg.

2645

-

2646

)

11

Fang

X

,

Zhang

Y

,

Zhang

R

,

Yang

L

,

Li

M

,

Ye

K

,

Guo

X

,

Wang

J

,

Su

B

.

Genome sequence and global sequence variation map with 5.5 million SNPs in Chinese rhesus macaque

,

Genome Biol.

,

2011

, vol.

12

pg.

R63

12

Wang

Z

,

Gerstein

M

,

Snyder

M

.

RNA-Seq: a revolutionary tool for transcriptomics

,

Nat. Rev. Genet.

,

2009

, vol.

10

(pg.

57

-

63

)

13

Brawand

D

,

Soumillon

M

,

Necsulea

A

,

Julien

P

,

Csardi

G

,

Harrigan

P

,

Weier

M

,

Liechti

A

,

Aximu-Petri

A

,

Kircher

M

, et al.

The evolution of gene expression levels in mammalian organs

,

Nature

,

2011

, vol.

478

(pg.

343

-

348

)

14

Liu

S

,

Lin

L

,

Jiang

P

,

Wang

D

,

Xing

Y

.

A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

578

-

588

)

15

Liu

Y

,

Han

D

,

Han

Y

,

Yan

Z

,

Xie

B

,

Li

J

,

Qiao

N

,

Hu

H

,

Khaitovich

P

,

Gao

Y

, et al.

Ab initio identification of transcription start sites in the Rhesus macaque genome by histone modification and RNA-Seq

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

1408

-

1418

)

16

Xu

AG

,

He

L

,

Li

Z

,

Xu

Y

,

Li

M

,

Fu

X

,

Yan

Z

,

Yuan

Y

,

Menzel

C

,

Li

N

, et al.

Intergenic and repeat transcription in human, chimpanzee and macaque brains measured by RNA-Seq

,

PLoS Comput. Biol.

,

2010

, vol.

6

pg.

e1000843

17

Yan

G

,

Zhang

G

,

Fang

X

,

Zhang

Y

,

Li

C

,

Ling

F

,

Cooper

DN

,

Li

Q

,

Li

Y

,

van Gool

AJ

, et al.

Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques

,

Nat. Biotechnol.

,

2011

, vol.

29

(pg.

1019

-

1023

)

18

McQuilton

P

,

St Pierre

SE

,

Thurmond

J

.

FlyBase 101—the basics of navigating FlyBase

,

Nucleic Acids Res.

,

2012

, vol.

40

(pg.

D706

-

D714

)

19

Yook

K

,

Harris

TW

,

Bieri

T

,

Cabunoc

A

,

Chan

J

,

Chen

WJ

,

Davis

P

,

de la Cruz

N

,

Duong

A

,

Fang

R

, et al.

WormBase 2012: more genomes, more data, new website

,

Nucleic Acids Res.

,

2012

, vol.

40

(pg.

D735

-

D741

)

20

Blake

JA

,

Bult

CJ

,

Kadin

JA

,

Richardson

JE

,

Eppig

JT

.

The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D842

-

D848

)

21

Li

H

,

Durbin

R

.

Fast and accurate short read alignment with Burrows-Wheeler transform

,

Bioinformatics

,

2009

, vol.

25

(pg.

1754

-

1760

)

22

Trapnell

C

,

Pachter

L

,

Salzberg

SL

.

TopHat: discovering splice junctions with RNA-Seq

,

Bioinformatics

,

2009

, vol.

25

(pg.

1105

-

1111

)

23

Burset

M

,

Seledtsov

IA

,

Solovyev

VV

.

Analysis of canonical and non-canonical splice sites in mammalian genomes

,

Nucleic Acids Res.

,

2000

, vol.

28

(pg.

4364

-

4375

)

24

Roberts

A

,

Pimentel

H

,

Trapnell

C

,

Pachter

L

.

Identification of novel transcripts in annotated genomes using RNA-Seq

,

Bioinformatics

,

2011

, vol.

27

(pg.

2325

-

2329

)

25

Li

CY

,

Zhang

Y

,

Wang

Z

,

Cao

C

,

Zhang

PW

,

Lu

SJ

,

Li

XM

,

Yu

Q

,

Zheng

X

,

Du

Q

, et al.

A human-specific de novo protein-coding gene associated with human brain functions

,

PLoS Comput. Biol.

,

2010

, vol.

6

pg.

e1000734

26

Xie

C

,

Zhang

EY

,

Chen

JY

,

Liu

CJ

,

Zhou

WZ

,

Li

Y

,

Zhang

M

,

Zhang

R

,

Wei

L

,

Li

CY

.

Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs

,

PLoS Genet.

,

2012

, vol.

8

pg.

e1002942

27

Kerrien

S

,

Aranda

B

,

Breuza

L

,

Bridge

A

,

Broackes-Carter

F

,

Chen

C

,

Duesbury

M

,

Dumousseau

M

,

Feuermann

M

,

Hinz

U

, et al.

The IntAct molecular interaction database in 2012

,

Nucleic Acids Res.

,

2012

, vol.

40

(pg.

D841

-

D846

)

28

Kong

L

,

Wang

J

,

Zhao

S

,

Gu

X

,

Luo

J

,

Gao

G

.

ABrowse–a customizable next-generation genome browser framework

,

BMC Bioinformatics

,

2012

, vol.

13

pg.

2

29

Guberman

JM

,

Ai

J

,

Arnaiz

O

,

Baran

J

,

Blake

A

,

Baldock

R

,

Chelala

C

,

Croft

D

,

Cros

A

,

Cutts

RJ

, et al.

BioMart Central Portal: an open database network for the biological community

,

Database

,

2011

, vol.

2011

bar041

Google Scholar

OpenURL Placeholder Text

WorldCat

30

Parkhomchuk

D

,

Borodina

T

,

Amstislavskiy

V

,

Banaru

M

,

Hallen

L

,

Krobitsch

S

,

Lehrach

H

,

Soldatov

A

.

Transcriptome analysis by strand-specific sequencing of complementary DNA

,

Nucleic Acids Res.

,

2009

, vol.

37

pg.

e123

31

Flicek

P

,

Amode

MR

,

Barrell

D

,

Beal

K

,

Brent

S

,

Carvalho-Silva

D

,

Clapham

P

,

Coates

G

,

Fairley

S

,

Fitzgerald

S

, et al.

Ensembl 2012

,

Nucleic Acids Res.

,

2012

, vol.

40

(pg.

D84

-

D90

)

32

Wang

ET

,

Sandberg

R

,

Luo

S

,

Khrebtukova

I

,

Zhang

L

,

Mayr

C

,

Kingsmore

SF

,

Schroth

GP

,

Burge

CB

.

Alternative isoform regulation in human tissue transcriptomes

,

Nature

,

2008

, vol.

456

(pg.

470

-

476

)

33

Lim

LP

,

Burge

CB

.

A computational analysis of sequence features involved in recognition of short introns

,

Proc. Natl. Acad. Sci. USA

,

2001

, vol.

98

(pg.

11193

-

11198

)

Google Scholar

Crossref

WorldCat

34

Maglott

D

,

Ostell

J

,

Pruitt

KD

,

Tatusova

T

.

Entrez Gene: gene-centered information at NCBI

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D52

-

D57

)

35

Pruitt

KD

,

Tatusova

T

,

Brown

GR

,

Maglott

DR

.

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

,

Nucleic Acids Res.

,

2012

, vol.

40

(pg.

D130

-

D135

)

36

Gross

SS

,

Brent

MR

.

Using multiple alignments to improve gene prediction

,

J. Comput. Biol.

,

2006

, vol.

13

(pg.

379

-

393

)

37

Parra

G

,

Agarwal

P

,

Abril

JF

,

Wiehe

T

,

Fickett

JW

,

Guigo

R

.

Comparative gene prediction in human and mouse

,

Genome Res.

,

2003

, vol.

13

(pg.

108

-

117

)

38

Blanco

E

,

Parra

G

,

Guigo

R

.

Using geneid to identify genes

,

Curr. Protoc. Bioinformatics

,

2007

Chapter 4, Unit 4 3

Google Scholar

OpenURL Placeholder Text

WorldCat

39

Kozomara

A

,

Griffiths-Jones

S

.

miRBase: integrating microRNA annotation and deep-sequencing data

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D152

-

D157

)

40

Lowe

TM

,

Eddy

SR

.

tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence

,

Nucleic Acids Res.

,

1997

, vol.

25

(pg.

955

-

964

)

41

Benson

DA

,

Karsch-Mizrachi

I

,

Lipman

DJ

,

Ostell

J

,

Wheeler

DL

.

GenBank: update

,

Nucleic Acids Res.

,

2004

, vol.

32

(pg.

D23

-

D26

)

42

Jones

AR

,

Overly

CC

,

Sunkin

SM

.

The Allen Brain Atlas: 5 years and beyond

,

Nature Rev. Neurosci.

,

2009

, vol.

10

(pg.

821

-

828

)

Google Scholar

Crossref

WorldCat

43

Wu

C

,

Orozco

C

,

Boyer

J

,

Leglise

M

,

Goodale

J

,

Batalov

S

,

Hodge

CL

,

Haase

J

,

Janes

J

,

Huss

JW

III

, et al.

BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources

,

Genome Biol.

,

2009

, vol.

10

pg.

R130

44

Gardiner-Garden

M

,

Frommer

M

.

CpG islands in vertebrate genomes

,

J. Mol. Biol.

,

1987

, vol.

196

(pg.

261

-

282

)

45

Piontkivska

H

,

Yang

MQ

,

Larkin

DM

,

Lewin

HA

,

Reecy

J

,

Elnitski

L

.

Cross-species mapping of bidirectional promoters enables prediction of unannotated 5′ UTRs and identification of species-specific transcripts

,

BMC Genomics

,

2009

, vol.

10

pg.

189

46

Shahi

P

,

Loukianiouk

S

,

Bohne-Lang

A

,

Kenzelmann

M

,

Kuffer

S

,

Maertens

S

,

Eils

R

,

Grone

HJ

,

Gretz

N

,

Brors

B

.

Argonaute—a database for gene regulation by mammalian microRNAs

,

Nucleic Acids Res.

,

2006

, vol.

34

(pg.

D115

-

D118

)

47

Sethupathy

P

,

Corda

B

,

Hatzigeorgiou

AG

.

TarBase: a comprehensive database of experimentally supported animal microRNA targets

,

RNA

,

2006

, vol.

12

(pg.

192

-

197

)

48

Krek

A

,

Grun

D

,

Poy

MN

,

Wolf

R

,

Rosenberg

L

,

Epstein

EJ

,

MacMenamin

P

,

da Piedade

I

,

Gunsalus

KC

,

Stoffel

M

, et al.

Combinatorial microRNA target predictions

,

Nat. Genet.

,

2005

, vol.

37

(pg.

495

-

500

)

49

Lewis

BP

,

Burge

CB

,

Bartel

DP

.

Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets

,

Cell

,

2005

, vol.

120

(pg.

15

-

20

)

50

Enright

AJ

,

John

B

,

Gaul

U

,

Tuschl

T

,

Sander

C

,

Marks

DS

.

MicroRNA targets in Drosophila

,

Genome Biol.

,

2003

, vol.

5

pg.

R1

51

Betel

D

,

Koppal

A

,

Agius

P

,

Sander

C

,

Leslie

C

.

Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites

,

Genome Biol.

,

2010

, vol.

11

pg.

R90

52

Zhang

Y

,

Liu

XS

,

Liu

QR

,

Wei

L

.

Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species

,

Nucleic Acids Res.

,

2006

, vol.

34

(pg.

3465

-

3475

)

53

Li

JT

,

Zhang

Y

,

Kong

L

,

Liu

QR

,

Wei

L

.

Trans-natural antisense transcripts including noncoding RNAs in 10 species: implications for expression regulation

,

Nucleic Acids Res.

,

2008

, vol.

36

(pg.

4833

-

4844

)

54

Lee

TY

,

Huang

HD

,

Hung

JH

,

Huang

HY

,

Yang

YS

,

Wang

TH

.

dbPTM: an information repository of protein post-translational modification

,

Nucleic Acids Res.

,

2006

, vol.

34

(pg.

D622

-

D627

)

55

Sherry

ST

,

Ward

MH

,

Kholodov

M

,

Baker

J

,

Phan

L

,

Smigielski

EM

,

Sirotkin

K

.

dbSNP: the NCBI database of genetic variation

,

Nucleic Acids Res.

,

2001

, vol.

29

(pg.

308

-

311

)

56

Zhang

J

,

Feuk

L

,

Duggan

GE

,

Khaja

R

,

Scherer

SW

.

Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome

,

Cytogenetic Genome Res.

,

2006

, vol.

115

(pg.

205

-

214

)

Google Scholar

Crossref

WorldCat

57

Benson

G

.

Tandem repeats finder: a program to analyze DNA sequences

,

Nucleic Acids Res.

,

1999

, vol.

27

(pg.

573

-

580

)

58

Hunter

S

,

Jones

P

,

Mitchell

A

,

Apweiler

R

,

Attwood

TK

,

Bateman

A

,

Bernard

T

,

Binns

D

,

Bork

P

,

Burge

S

, et al.

InterPro in 2011: new developments in the family and domain prediction database

,

Nucleic Acids Res.

,

2012

, vol.

40

(pg.

D306

-

D312

)

59

Ashburner

M

,

Ball

CA

,

Blake

JA

,

Botstein

D

,

Butler

H

,

Cherry

JM

,

Davis

AP

,

Dolinski

K

,

Dwight

SS

,

Eppig

JT

, et al.

Gene ontology: tool for the unification of biology. The Gene Ontology Consortium

,

Nat. Genet.

,

2000

, vol.

25

(pg.

25

-

29

)

60

Kanehisa

M

,

Goto

S

,

Sato

Y

,

Furumichi

M

,

Tanabe

M

.

KEGG for integration and interpretation of large-scale molecular data sets

,

Nucleic Acids Res.

,

2012

, vol.

40

(pg.

D109

-

D114

)

61

D'Eustachio

P

.

Reactome knowledgebase of human biological pathways and processes

,

Methods Mol. Biol.

,

2011

, vol.

694

(pg.

49

-

61

)

62

Schaefer

CF

,

Anthony

K

,

Krupa

S

,

Buchoff

J

,

Day

M

,

Hannay

T

,

Buetow

KH

.

PID: the Pathway Interaction Database

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D674

-

D679

)

63

Keshava Prasad

TS

,

Goel

R

,

Kandasamy

K

,

Keerthikumar

S

,

Kumar

S

,

Mathivanan

S

,

Telikicherla

D

,

Raju

R

,

Shafreen

B

,

Venugopal

A

, et al.

Human Protein Reference Database—2009 update

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D767

-

D772

)

64

Szklarczyk

D

,

Franceschini

A

,

Kuhn

M

,

Simonovic

M

,

Roth

A

,

Minguez

P

,

Doerks

T

,

Stark

M

,

Muller

J

,

Bork

P

, et al.

The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D561

-

D568

)

65

Salwinski

L

,

Miller

CS

,

Smith

AJ

,

Pettit

FK

,

Bowie

JU

,

Eisenberg

D

.

The Database of Interacting Proteins: 2004 update

,

Nucleic Acids Res.

,

2004

, vol.

32

(pg.

D449

-

D451

)

66

Stark

C

,

Breitkreutz

BJ

,

Chatr-Aryamontri

A

,

Boucher

L

,

Oughtred

R

,

Livstone

MS

,

Nixon

J

,

Van Auken

K

,

Wang

X

,

Shi

X

, et al.

The BioGRID Interaction Database: 2011 update

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D698

-

D704

)

67

Karp

PD

,

Ouzounis

CA

,

Moore-Kochlacs

C

,

Goldovsky

L

,

Kaipa

P

,

Ahren

D

,

Tsoka

S

,

Darzentas

N

,

Kunin

V

,

Lopez-Bigas

N

.

Expansion of the BioCyc collection of pathway/genome databases to 160 genomes

,

Nucleic Acids Res.

,

2005

, vol.

33

(pg.

6083

-

6089

)

68

Hamosh

A

,

Scott

AF

,

Amberger

JS

,

Bocchini

CA

,

McKusick

VA

.

Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders

,

Nucleic Acids Res.

,

2005

, vol.

33

(pg.

D514

-

D517

)

69

Hindorff

LA

,

Sethupathy

P

,

Junkins

HA

,

Ramos

EM

,

Mehta

JP

,

Collins

FS

,

Manolio

TA

.

Potential etiologic and functional implications of genome-wide association loci for human diseases and traits

,

Proc. Natl. Acad. Sci. USA

,

2009

, vol.

106

(pg.

9362

-

9367

)

Google Scholar

Crossref

WorldCat

70

Zhang

Y

,

De

S

,

Garner

JR

,

Smith

K

,

Wang

SA

,

Becker

KG

.

Systematic analysis, comparison, and integration of disease based human genetic association data and mouse genetic phenotypic information

,

BMC Med. Genomics

,

2010

, vol.

3

pg.

1

71

Ding

S

,

Wu

X

,

Li

G

,

Han

M

,

Zhuang

Y

,

Xu

T

.

Efficient transposition of the piggyBac (PB) transposon in mammalian cells and mice

,

Cell

,

2005

, vol.

122

(pg.

473

-

483

)

72

McDonagh

EM

,

Whirl-Carrillo

M

,

Garten

Y

,

Altman

RB

,

Klein

TE

.

From pharmacogenomic knowledge acquisition to clinical applications: the PharmGKB as a clinical pharmacogenomic biomarker resource

,

Biomarkers Med.

,

2011

, vol.

5

(pg.

795

-

806

)

Google Scholar

Crossref

WorldCat

73

Lamb

J

,

Crawford

ED

,

Peck

D

,

Modell

JW

,

Blat

IC

,

Wrobel

MJ

,

Lerner

J

,

Brunet

JP

,

Subramanian

A

,

Ross

KN

, et al.

The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease

,

Science

,

2006

, vol.

313

(pg.

1929

-

1935

)

74

Zhang

X

,

Zhang

R

,

Raab

S

,

Zheng

W

,

Wang

J

,

Liu

N

,

Zhu

T

,

Xue

L

,

Song

Z

,

Mao

J

, et al.

Rhesus macaques develop metabolic syndrome with reversible vascular dysfunction responsive to pioglitazone

,

Circulation

,

2011

, vol.

124

(pg.

77

-

86

)

75

Blanchette

M

,

Kent

WJ

,

Riemer

C

,

Elnitski

L

,

Smit

AF

,

Roskin

KM

,

Baertsch

R

,

Rosenbloom

K

,

Clawson

H

,

Green

ED

, et al.

Aligning multiple genomic sequences with the threaded blockset aligner

,

Genome Res.

,

2004

, vol.

14

(pg.

708

-

715

)

76

Siepel

A

,

Bejerano

G

,

Pedersen

JS

,

Hinrichs

AS

,

Hou

M

,

Rosenbloom

K

,

Clawson

H

,

Spieth

J

,

Hillier

LW

,

Richards

S

, et al.

Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes

,

Genome Res.

,

2005

, vol.

15

(pg.

1034

-

1050

)

77

Siepel

A

,

Haussler

D

.

Phylogenetic estimation of context-dependent substitution rates by maximum likelihood

,

Mol. Biol. Evol.

,

2004

, vol.

21

(pg.

468

-

488

)

Author notes

The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
November 2016	1
December 2016	8
January 2017	1
February 2017	7
March 2017	11
April 2017	10
May 2017	11
June 2017	7
July 2017	19
August 2017	5
September 2017	11
October 2017	11
November 2017	7
December 2017	37
January 2018	92
February 2018	19
March 2018	32
April 2018	41
May 2018	24
June 2018	33
July 2018	15
August 2018	27
September 2018	37
October 2018	22
November 2018	21
December 2018	28
January 2019	24
February 2019	24
March 2019	30
April 2019	28
May 2019	41
June 2019	26
July 2019	51
August 2019	22
September 2019	36
October 2019	32
November 2019	29
December 2019	31
January 2020	31
February 2020	25
March 2020	27
April 2020	28
May 2020	22
June 2020	30
July 2020	28
August 2020	34
September 2020	30
October 2020	35
November 2020	49
December 2020	25
January 2021	26
February 2021	31
March 2021	32
April 2021	28
May 2021	24
June 2021	37
July 2021	37
August 2021	25
September 2021	46
October 2021	45
November 2021	44
December 2021	44
January 2022	43
February 2022	47
March 2022	36
April 2022	36
May 2022	25
June 2022	25
July 2022	51
August 2022	66
September 2022	269
October 2022	359
November 2022	141
December 2022	234
January 2023	147
February 2023	116
March 2023	96
April 2023	153
May 2023	61
June 2023	20
July 2023	46
August 2023	33
September 2023	23
October 2023	41
November 2023	37
December 2023	63
January 2024	86
February 2024	88
March 2024	88
April 2024	30

Article Contents

RhesusBase: a knowledgebase for the monkey research community

Abstract

INTRODUCTION

MATERIALS AND METHODS

Ethics statement

Computational processing of strand-specific poly (A)-positive RNA-Seq data

Genome-wide refinement of monkey gene structures

Evaluation of transcript structure refinement

RNA isolation, cDNA synthesis, PCR and Sanger sequencing

Integration of functional genomics data in rhesus macaque

Development of RhesusBase management system and interactive user interfaces

RESULTS

Correction of gene models in 28.7% Ensembl macaque transcripts

The transcript structures in rhesus macaque were convincingly refined

Comprehensive integration of functional genomics data in rhesus macaque

RhesusBase: a ‘one-stop’ resource for the monkey research community

DISCUSSION

ACCESSION NUMBERS

AUTHOR CONTRIBUTIONS

FUNDING

ACKNOWLEDGEMENTS

REFERENCES

Author notes

Supplementary data

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

RhesusBase: a knowledgebase for the monkey research community

Abstract

INTRODUCTION

MATERIALS AND METHODS

Ethics statement

Computational processing of strand-specific poly (A)-positive RNA-Seq data

Genome-wide refinement of monkey gene structures

Evaluation of transcript structure refinement

RNA isolation, cDNA synthesis, PCR and Sanger sequencing

Integration of functional genomics data in rhesus macaque

Development of RhesusBase management system and interactive user interfaces

RESULTS

Correction of gene models in 28.7% Ensembl macaque transcripts

The transcript structures in rhesus macaque were convincingly refined

Comprehensive integration of functional genomics data in rhesus macaque

RhesusBase: a ‘one-stop’ resource for the monkey research community

DISCUSSION

ACCESSION NUMBERS

AUTHOR CONTRIBUTIONS

FUNDING

ACKNOWLEDGEMENTS

REFERENCES

Author notes

Supplementary data

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only