Introduction

The last decades have witnessed the advent and widespread application of viral metagenomics to study viral diversity and to describe novel viruses from many animal hosts, including non-human primates (NHPs) [1,2,3,4]. Such virological surveillance of viruses circulating in wildlife can serve as baseline exploratory dataset that can be further explored to identify the presence of unusual viruses in hosts showing symptoms of diseases. Such data are also critical to assess the changes in viral diversity in response to land-use and agricultural industry changes, as these are important environmental drivers of disease emergence. This could help us, in a close future, to take appropriate responses to mitigate those changes and potentially reduce zoonotic transmission [1]. Finally, virome comparison between different hosts will also be helpful to detect interspecies transmission [5].

Because humans and NHPs are genetically closely related, particularly those of the Hominidae family, the risk of interspecies transmission of pathogenic microorganisms is higher. This is particularly critical for those who may have close contact with human populations as it may represent a threat to both humans and NHPs [5, 6]. For example, the human immunodeficiency virus (HIV) originated from chimpanzees and is responsible of a well-established infection in the human population. Another example of zoonotic transmission from great apes to humans is Ebola, with only sporadic events reported so far [7]. Conversely, human Metapneumovirus and Respiratory Syncytial virus are two examples of human to animal transmission, which have threat to great ape survival [8]. Special attention should thus be paid to detect and characterize novel viruses that may be of potential risk for interspecies transmission and subsequent infection. Previous virological surveys on NHPs have identified several human pathogenic viruses such as Ebola [5], Human Metapneumovirus [6], Respiratory Syncytial virus [8], and Picornavirus [9]. Gorillas, one of the four Hominidae NHPs, are known to harbor many viruses [7]. Several studies had been conducted on gorillas from the African region, mostly targeting known pathogens by molecular and serological methods. These studies have identified many viruses such as Adenovirus (Gabon [10], Republic of the Congo [11]), Simian foamy virus (Cameroon, Gabon [12]), Herpesvirus (Republic of the Congo [11]), Human T-lymphotropic virus type 4 [13], Enteroviruses [14], Hepatitis B virus (Cameroon [15]), Human Metapneumovirus (Rwanda) [6], and Polyomavirus (Republic of the Congo) [16] from gorillas. Viral metagenomic studies on NHPs are scarce and have mostly targeted DNA viruses either on wild-living or captive animals. For instance, several novel circular ssDNA viruses were identified, while their pathological role remains to be determined [2, 17, 18]. Recently, a novel astrovirus strain, close to the human MLB2 strain, has been identified from stool of a diarrheic captive chimpanzee in China [3].

In order to expand our knowledge on the viral repertory associated with free-living NHPs, stool samples were collected from wild gorillas in the Republic of the Congo, Central Africa in 2015. To our knowledge, this is the first report describing RNA viromes from wild gorilla feces. The stool samples were analyzed by shotgun viral metagenomics and sequences related to novel RNA viruses, i.e., novel picobirnaviruses, partitivirus, and Picornavirales (posa-like and dicistrovirus-like viruses) were described.

Materials and methods

Study site and gorilla population

The western lowland gorilla (Gorilla gorilla gorilla) is threatened with extinction due to habitat destruction, infectious diseases (especially the respiratory ones), and illegal bushmeat trade. The Lésio-Louna and South-West Léfini Gorilla Reserves are part of a collaborative project between the Government of the Republic of the Congo and the Aspinall Foundation that manage a protected area of 170,000 ha that is located about 140 km north of Brazzaville (Fig. 1a, b). The aim of this project is to protect threatened ecosystems in Congo-Brazzaville, as well as species living there—particularly gorillas. About 35 gorillas are sheltered in the Lesio-Louna/Léfini Natural Reserves and in August 2015, a Franco-Congolese scientific mission was conducted to investigate and establish the repertory of the fecal microflora (bacteria, viruses, parasites) associated with this wild gorilla population. This mission was approved by the Ministry of Health (No 208/MSP/CAB.15 of 20 August 2015) and the Forest Economy and Sustainable Development (No 94/MEFDD/CAB/DGACFAP-DTS of 24 August 2015) of the Republic of the Congo. All the animals were apparently healthy.

Fig. 1
figure 1

a Worldwide detection of picobirnavirus sequences. World map highlighting the 23 countries (red bullets) from which picobirnavirus genetic sequences have been reported based on both published articles and sequences available in GenBank. Each country is named with its three-digit International country code. b Map and description of the sampling site. Country map of the Republic of the Congo, highlighted with the appropriate place of sample collection. The original templates were downloaded from the public domain. (https://commons.wikimedia.org/wiki/File:BlankMap-World-alt.png#filelinks; https://commons.wikimedia.org/wiki/Atlas_of_the_Republic_of_the_Congo#/media/File:Congo_republic_sm04.png) (Color figure online)

Sample collection

For sample collection, gorillas (either isolated individuals or individuals in a group) were attracted by the reserve staff using apple fruits as baits. Fresh fecal samples (about 30 g) were then collected immediately after defecation from inside the stool with a sterile spatula, placed in a sterile collection tubes containing Saber medium and transported to the laboratory at Aix-Marseille University, France [19]. Fecal samples G1–G5 were collected from 5 isolated individuals. For the gorilla number 3 (G3), two samples were collected (G3A and G3B) on the same day. Samples G6–G17 (12 samples) were retrieved from different stools collected from a group of gorillas (about 30 individuals) and could not be associated with any particular gorilla. Table 1 provides information about the samples and the individuals. The export of the samples to France was authorized by the Ministry of Health (No 136/MSP/DGELM/DHP/SHE of 28 August 2015). In the laboratory, the samples were stored in aliquots at − 80 °C, until they were processed.

Table 1 Description of the gorillas (name, sex, age when applicable) and samples (sample Id., nature, coordinates, and sampling date)

Sample preparation

The eighteen fecal samples (1 g) were diluted (10–20%) in HBSS (Hanks Basic Salt Solution), vortexed, centrifuged and the supernatants were filtered through 0.45-µm filters (Millipore). To remove unprotected nucleic acids, the filtrates were first treated with 1 µl of RNase A (Ambion) for 15 min, followed by 1 µl of Benzonase (Novagen) and 4 µl of Turbo DNAase (Ambion) for 45 min, as described elsewhere [20]. The samples were extracted in two batches. First, the total nucleic acids (TNAs) were extracted from all 18 samples with QIAamp Viral RNA Mini Kit (Qiagen, Valencia, CA) and the extracted nucleic acids were pooled (pool 1). In a second batch, all the samples were extracted again with the Roche High Pure Viral Nucleic Acids Extraction Kit (Roche) and formed another five pools: pool 2 (G3A and G3B), pool 3 (G1–G3B), pool 4 (G4–G7), pool 5 (G8–G12), and pool 6 (G13–G17) to obtain more sequences from these samples.

RNA viral library preparation for Illumina sequencing

The TNAs were reverse-transcribed using SuperScript III Reverse Transcriptase (Invitrogen) and Random Hexamers (Invitrogen). The samples were barcoded in order to be mixed with 15 other metagenomic projects constructed by the Nextera XT DNA sample prep kit (Illumina). The libraries were prepared according to the Nextera XT protocol (Illumina). The libraries were purified, normalized, and then pooled (six pools) for sequencing on a MiSeq Platform (Illumina Inc, San Diego, CA, USA) using paired-end strategy. Automated cluster generation and paired-end sequencing with dual index reads were performed in a single 39-h run in 2 × 250 bp.

Sequence analysis

The Illumina sequences were analyzed using CLC genomics V7.5 (www.clcbio.com). The paired-end Illumina reads were imported into CLC and the adaptor was removed. The low-quality sequences were trimmed and then the cleaned sequences were de novo assembled using default parameters. The resulting contigs were subjected to BLASTx against nr database (NCBI) with a selection of “virus” as organism (the feature available in the CLC to blast the sequences against a virus database). Contigs with similarities to viral sequences of picobirnaviruses, posa-like viruses, dicistroviruses, and a partitivirus were detected. The related contigs were retrieved from all the sequencing pool for reconfirmation; the retrieved sequences were subjected again to BLASTx against nr database (NCBI) and the deduced protein sequences were re-confirmed with protein search (BLASTp) against nr database and appropriate classification.

Phylogenetic analysis

The putative Open Reading Frames (ORFs) were predicted by NCBI ORF finder (https://www.ncbi.nlm.nih.gov/orffinder/). The deduced viral protein sequences were aligned along with representative sequences of respective viruses which were retrieved from GenBank using MEGA V.5.1 [21] or V6.1 [22]. Then, the phylogenetic trees were constructed by maximum likelihood methods with best substitution model selected using ProtTest and 100 bootstrapped iterations.

Pairwise identity was calculated using sequence demarcation tool (v1.2) [23] and MEGA v6.1 [22]. The logo of ExxRxNxxxE amino acid motif was generated with http://weblogo.berkeley.edu with default parameter.

Recombination analysis

Complete RdRp coding sequences were examined for recombination using Simplot [24] and Rat [25], according to the default parameters.

Nucleotide sequences: All the sequences generated from this study were deposited in GenBank under the accession number KY502835-KY503023.

Result

Sequencing result

Total information of 7.4 Gb was obtained from a 784 K/mm2 cluster density with a cluster passing quality control filters of 94.5% (1,509,700 clusters). Within this run, the index representations of the 6 pools were determined to be of 1.56, 1.54, 7.32, 3.01, 5.89, and 5.39%, respectively. The raw data of 222,210; 220,126; 1,044,848; 428,865; 840,562; and 769,080 paired-end reads (total of 3,525,691 paired-end reads) were trimmed and filtered according to the read qualities.

Picobirnaviruses

After annotation by BLASTx, only contigs with similarities to RNA viruses were analyzed. From the six sequencing pools, 339 contigs related to picobirnaviruses were retrieved for further analysis. Finally, 289 contigs were re-confirmed by BLASTx against the nr database (NCBI) as picobirnaviruses. The sequences were grouped into 96 variants based on their genetic (either capsid [Segment 1] or RNA-dependent RNA polymerase (RdRp) [Segment 2]) relationship by pairwise identity analysis, and the maximum percentage of identity observed between the strains was 70%. These variants were present in more than one sequencing pool. The details of variants and its genetic relationship are presented in Table 2 and Supplemental Table 1. The variants are tentatively named as “Institut Hospitalo Universitaire-Congo-Gorilla-picobirnavirus Variants and abbreviated as “IHU-Con-GPbvs-V1 to 96.” Henceforth, these viruses are referred to as variants (GPbvs V1–V96) when appropriate in the text.

Table 2 Description of nearly complete genome segments of gorilla picobirnaviruses

Genetic compositions

Among the 96 variants, 22 (15 based on Seg 1 and 7 based on Seg 2) have nearly complete segments with 13 of them having a sequencing coverage > 30X. Each segment was covered with numerous reads with a range from 60 to 25,541 (Table 2 and Supplemental Fig. 1). Of the 22 variants, 18 (GPbvs V1, V3, V4, V6–V8, V10–V16, V18–V22) were found in more than one sequencing pool (Table 2). The Supplemental Fig. 1 shows the coverage of each contig after re-mapping of the reads against its respective original contigs. The GC content of both proteins was analyzed along with strains detected from different hosts. The number of host and sequences analyzed for the capsid are gorilla (16), human (3), rabbit (1), otarine (1), turkey (4), porcine (5), horse (4), dromedary (6), fox (1), and for the RdRp are gorilla (14), human (12), bovine (1), monkey (1), otarine (1), porcine (3), turkey (1), dromedary (4), and horse (4). The capsid proteins of GPbvs (n = 16) have an average GC content of 40%, and for the other strains, the GC content varies between 40 and 50% (Supplemental Fig. 2a). In case of the RdRp, the GPbvs have an average GC content of 43.6% and similar values were observed from PBVs detected from other hosts (Supplemental Fig. 2b).

Segment 1 (long segment)

Fifteen variants had a nearly complete segment. The size of the segments ranged from 2208 to 2916 bp, which encode two open reading frames (ORF) and non-coding region at both terminals. Some variants (V-2, V-11, and V-12) had overlapping reading frames but most of them harbored an intergenic region between ORFs. In these intergenic regions, 5 to a maximum of 59 nucleotides were observed among the variants analyzed (Table 2 and Fig. 2a). The non-coding region at the 5′ end had a maximum of 486 nucleotides and 246 nucleotides at the 3′ end.

Fig. 2
figure 2

Schematic representation showing the genomic organization of complete segments of GPbvs. a Genomic organization of the complete Segment 1 of 15 variants detected in gorilla stools. All the variants predicted with two ORFs (indicated by blue and orange boxes) and non-coding regions at both terminals. Three variants (V2, V11, and V12) have overlapping coding regions. The other 12 variants have an intergenic region between the ORFs. The nucleotide sequences in the intergenic region are presented in the box. Each variant is labeled with its original contig name and number of bp within brackets. NCR—Non-coding region. Each ORF is highlighted with starting and ending coding region, number of amino acids, and stop codon at the end. b Schematic representation of the genomic features observed in Segment 1 of all the 15 variants. The features observed from PBVs detected from different hosts are highlighted in the boxes. c Genomic organization of the complete Segment 2 of 7 variants obtained from gorilla stools. Genomic organization of Segment 2, which encodes one ORF and has non-coding regions at both terminal ends. The name of each variant and the annotation is the same as the one described in (a) (Color figure online)

The ORF 1 encodes a protein (hypothetical) with an unknown function. The size of the proteins varies among the strains. The smallest protein (66 amino acids) was observed in V-15 and V-4 had the largest protein (293 amino acids) (Fig. 2a). The proteins are highly disordered in nature; a range of 15–68% was observed between the strains (Fig. 2b). The strains detected from humans and rabbits (recognized type species by the ICTV) had 40 and 54% of disordered protein, respectively. The homology search in the protein data bank (PDB) could not find any significantly related one. The hypothetical protein has short (10 amino acids) repetitive motifs (ExxRxNxxxE), which are observed in all PBVs detected from different hosts (Fig. 3a, b). The number of motifs varied between the variants; V-1 (S1-2349) had only 2 motifs, whereas up to 11 were observed in V-5 (S4-111) (Fig. 3b). The frequency of occurrence of the amino acids in the motifs is presented as a logo or graphical format in Fig. 3c. In the “X” position, three different amino acids can occur. The frequency of these three amino acids was calculated independently and included in the logo (highlighted within the box with the most frequent one placed at the first place). The final motif is E[TNS][KNA]R[HSA]N[LVR][AEQ][TKQ]E.

Fig. 3
figure 3

Prediction of conserved, repetitive amino acid motifs in the ORF 1 of all picobirnaviruses. a The details of ORF 1 (hypothetical protein) of 26 variants (15 GPbvs and 11 PBVs from other hosts) were analyzed. This hypothetical protein shows repetitive amino acid motifs (ExxRxNxxxE) in all the variants. The motifs are highlighted with individual boxes. b Detailed version of (a) which clearly presents the individual motifs along with arrangement pattern. c Graphical representation of the amino acid motifs (ExxRxNxxxE) and reparation of its ‘x’ amino acids. The bit size of each amino acid was directly related to the frequency of its presence (bit 4 is 100%). For the remaining amino acids (x), three amino acids are detected at each place and the most frequent among the three is presented at the top, as highlighted in the box

The ORF 2 encodes the capsid protein which size ranged from 522 to 598 amino acids in length. The preliminary analysis revealed that all these variants were novel and shared < 40% amino acid identity with other known PBV sequences available in GenBank (Table 2). The most divergent virus is V-1, which shared only 27% of identity with dromedary picobirnavirus (AIY31265). The capsid protein was highly divergent; particularly, the first 88 amino acids at the N-terminal region were hyper variable and quite distinct between each other’s and had no homolog in the public database (Fig. 2b). Also, this defined region was almost completely disordered in nature.

Segment 2 (small segment)

Seven variants have a nearly complete segment. As observed with other known PBVs, the segment 2 displays a single ORF, which encodes the RdRp and harbors non-coding regions at both ends. The segment size ranges from 1.7 to 2.3 kb and encodes proteins ranging from 519 to 555 aa in length (Table 2; Fig. 2c). The variants share a range of 25 to 68% of identity with other known PBVs in GenBank. V-20 shares only 25% of identity with a strain detected from a horse (AKN50618). The Seg 2 has a long non-coding region ranging from 684 nucleotides (684 nt) at the 5′ end of V-16 to 129 nt at the 3′ end in V-20 (Table 2).

Complete RdRp coding sequences were analyzed with that from other PBVs available in GenBank. The RdRp proteins had two conserved amino acid motifs (GDD and SGxxxT). The pairwise identity analysis revealed that gorilla picobirnaviruses are distinct between each other and share a range of 20–70% of identity (Supplemental Fig. 3). Also, the mean pairwise identity of GPbvs with other picobirnaviruses ranged from 36 to 52%; strains detected from horse are quite distinct from GPbvs as well as with all other PBVs (Supplemental Table 2). A maximum likelihood phylogenetic tree built with all the strains revealed two genetic clusters (earlier referred to as genogroups) [26, 27]. The strain V-20 (S6-233) formed a separate branch (Fig. 4), sharing only 25% of identity with other strains in the public database (Table 2). This suggests further the existence of more genogroups or genetic clusters of PBVs. The genetic distance plot of nearly complete RdRp nucleotide sequences suggests that the strains detected from gorillas maintain an almost similar percentage of identity in the entire genomic positions which were analyzed (Fig. 5).

Fig. 4
figure 4

Phylogenetic analysis of picobirnaviruses and partitivirus detected from gorilla stools and other hosts. A maximum likelihood phylogenetic tree was constructed based on the RdRp proteins of the viruses with an algorithm with the lowest BIC score. The number of strains was gorilla-20, human-9, horse-4, dromedary-4, porcine-2, 1 each for turkey, otarine, bovine, and monkey. The variant names are labeled first with the GenBank accession number, followed by the host name. In the case of gorilla picobirnaviruses, the contig numbers were used as presented in Table 2 and Supplemental Table 1. The genetic clusters and branches are highlighted with different colors (Color figure online)

Fig. 5
figure 5

Divergence plot of the RdRp sequences of picobirnaviruses. In this analysis, 26 picobirnaviruses detected from gorilla, human, porcine, monkey, cow, otarine, dromedary, turkey, and horse were included. The seven GPbvs are highlighted with dotted lines. The “X” axis indicates the genomic position of the viral genome and the “Y” axis indicates the percentage of diversity

Relationship of gorilla strains closely related to human viruses

Two variants detected from gorilla GPbvs V-31 (1918 nt-RdRp) and GPbvs V-39 (1638 nt-RdRp) shared a high level of amino acid sequence identity with strains detected from humans during 2007–2008 in India. The V-31 shared 98% (BAJ53295) and V-39, 92% (BAJ53293) with these strains. Another three RdRp sequences, V-68 (494 nt), V-77 (383 nt), V-96 (224 nt), shared over 90% of identity (amino acids) with strains detected from Macaca mulatta in Bangladesh (Supplemental Table 1).

As previous studies demonstrated recombination events in PBVs [1], the gorilla sequences were analyzed along with sequences obtained from humans, monkeys, porcine, dromedaries, turkeys, and bovines. The analysis indicated that there was no apparent recombination event observed with the strains from gorilla.

Other novel viruses

Novel gorilla stool-associated partitivirus

Partitiviruses are non-enveloped, bi-segmented dsRNA viruses (1.4–2.4 kb in size). The smaller segment usually codes for the capsid protein and the larger usually codes for the virion-associated RNA polymerase. The viruses are associated with latent infections of their fungal, protozoan, and plant hosts [28]. There are no known natural vectors. Currently, the family Partitiviridae consists of five genera (Alphapartitivirus, Betapartitivirus, Cryspovirus, Deltapartitivirus, Gammapartitivirus) and 15 unassigned species (http://www.ictvonline.org/virusTaxonomy.asp).

In this study, a contig measuring about 2421 bp was identified (GenBank accession KY503021) in single sequencing pool (pool 5). The segment has one ORF, which encodes a putative RdRp protein and a non-coding region at both ends (210 bp at 5′ end and 279 bp at 3′ end). A BLASTx search against the Non-redundant protein database (NCBI) returns hit to Beet Cryptic Virus 3 (e-value 4e−05) with only 25% of identity (query coverage of 27%). In order to describe it effectively, the sequence was tentatively named as ‘Gorilla Stool-associated Partitivirus’ and abbreviated as GOSP. The GOSP sequence was analyzed along with representative sequences of other partitiviruses, available in GenBank. The alignment of amino acid (401 aa) sequences of 20 strains revealed that there are two motifs (KxR, GxPSG) and few conserved amino acids to this viral family. A maximum likelihood phylogenetic tree indicated that this strain was distinct and diverged from other strains with over 80% at the amino acid level and formed a separate branch (Fig. 4).

Novel gorilla stool-associated picornavirus

According to the ICTV, the order ‘Picornavirales’ consists of five families (Dicistroviridae, Iflaviridae, Marnaviridae, Picornaviridae, Secoviridae, and 2 unassigned genera). In this study, a contig about 2 kb (2281 bp) was identified (KY503022) in three sequencing pools (pools 4, 5, and 6). A conserved domain search of translated protein sequence gave hit to ‘rhv_like picornavirus capsid protein’ domain. The sequence detected from gorillas was then tentatively named as GOSA (Gorilla Stool Associated) virus. The search of translated nucleotide sequences against the protein database showed that this virus was closely related to ‘posa-like viruses’ [29], Husaviruses [30], Fisavirus [31], Rasavirus, and Basavirus [32] and shared < 37% of amino acid identity with them. These viruses have been recently detected from porcine, humans, fishes, rats, and bats, respectively, and proposed as a new viral family in the order of Picornavirales, yet to be recognized by the ICTV. A phylogenetic tree was constructed based on 314 amino acids along with the most closely related viruses and representative sequences of all the members in the Picornavirales. The ‘posa-like viruses’ formed a separate cluster, and within the cluster, many groups were observed. The GOSA virus formed a separate branch and was distinct from other strains; the most closely related viruses are ‘posavirus 1-like viruses’ (Fig. 6a).

Fig. 6
figure 6

a Phylogenetic analysis of picornavirales (posa-like viruses) sequence including the novel viral sequence from this study. The maximum likelihood tree was constructed based on 314 amino acids with LG + G substitution model with 100 replicates. In the analysis, 31 sequences were included consisting in five recognized families in the Picornavirales and unrecognized ‘posa-like viruses’ along with the gorilla stool-associated virus. Strains from known viral families are clustered according to the family. The unclassified ‘posa-like viruses’ formed a separate cluster in the tree. The strain detected from gorillas also branched within this cluster. Each family is highlighted with a shadow and name. b Phylogenetic analysis of the RdRp protein of the gorilla stool-associated dicistrovirus and sequences detected from other hosts. A maximum likelihood (WAG with G + I) tree was constructed based on the analysis of 508 amino acids with 100 replicates. In the tree, representative sequences from all three known genera were included and highlighted in shadow and named. c Phylogenetic analysis of the capsid protein (ORF 2) of the gorilla stool-associated dicistrovirus and sequences detected from other hosts. A maximum likelihood tree (620 amino acids) was constructed for the gorilla stool-associated dicistrovirus along with other known species in the family. The tree was constructed using rtREV + G + I + F algorithm with 100 replicates. The three recognized genera are highlighted and the strain detected from gorillas is highlighted with bullets and boxes

Novel gorilla stool-associated dicistrovirus

Dicistroviruses are another family in the order of Picornavirales, with a genome of about 8.5–10 kb in size. Currently, the family Dicistroviridae consists of 15 species, placed under three genera (Aparavirus, Cripavirus, Triatovirus). The genome encodes two open reading frames; the ORF 1 encodes an helicase, a viral protein, a protease, and a RNA-dependent RNA polymerase (pol), while ORF 2 encodes a capsid protein [33].

From gorilla stool samples, a contig of about 4,045 nucleotides was identified from single sequencing pool (pool 1), which has partial sequences of both ORFs (KY503023). The initial BLASTx search against NCBI nr database suggested that the ORF 1 (541 amino acids) was related to Northwest Territories Cripavirus with 48% identity (63% query coverage), followed by 33% of identity (95% coverage) with drosophila C virus. The capsid protein (710 amino acids) was highly divergent and merely shared 29% of identity with Aphid lethal paralysis virus. A nucleotide (BLASTn) search against nr/nt database of NCBI turned out to be a non-viral hit.

Both proteins (capsid 600 amino acids and RdRp 508 amino acids) were further analyzed along with other strains in the family. The alignment of RdRp (1245 to 1765 aa position of Cricket paralysis virus [NP_647481]) sequences identified many conserved amino acids and motifs (AGDxxxxD; PSGxxxTxxxN; GDD) common to the Dicistroviridae. The motif ‘GDD’ is also observed in the RdRp protein of picobirnaviruses and partitiviruses (except Radish partitivirus AAU88207). The pairwise distance analysis indicated that these strains were distinct from others with a range of 50–72% (mean distance of 62%). In the capsid protein, some conserved amino acids and motifs (PxxS, KTxxHxxR) were observed. The pairwise distance analysis suggested that the gorilla stool-associated dicistrovirus was distinct from others by a minimum of 70% (Aphid lethal paralysis virus) to a maximum of 78% (Taura Syndrome Virus) with a mean distance of 70%. A maximum likelihood phylogenetic tree was constructed for both proteins; the gorilla stool-associated dicistrovirus formed a separate branch between the genus of Aparavirus and Cripavirus in both trees. The RdRp sequence aligned with Northwest Territories Cripavirus (Fig. 6b, c). In all, although this gorilla stool-associated dicistrovirus was related to the genus of Cripavirus, it was quite distinct and could represent as novel member of this family.

Discussion

Picobirnaviruses (PBVs) belong to the genus Picobirnavirus in the family of Picobirnaviridae. These viruses were first identified from human [34] and rat stools in 1988 from Brazil [35]. Since then, these viruses have been detected from various symptomatic and asymptomatic hosts, i.e., human [36], domestic animals (bovine/cows/calf [37], cat [38], chicken, dogs [39], mouse [40], dromedary [41], pigs [42], rats [39], rabbit [43], horse/foals [44], turkey [45], wild animals (fox/red fox) [46], jaguar, lion [47], monkey [48], wild rodent [40], orangutan [49], puma [47], rhesus macaques [1], seal lion [50], small carnivores [51], snakes [39], wolf, bats (mammals) [52], and also in waste water [53]. However, only partial sequences are available for many of these viruses. To our knowledge, this is the first report that describes novel PBVs in wild gorilla stools (Fig. 1a, Supplemental Table 3).

Picobirnavirus particles are non-enveloped, about 33–37 nm in diameter enclosing a bi-segmented RNA genome [43]. The segment 1 is about 2.4–2.6 kb and segment 2 ranges from 1.5 to 1.9 kb. The analysis of nearly complete sequences (GPbvs) of both segments indicates that the nucleotide composition and GC contents varies between strains detected from different hosts (Supplemental Fig. 2a, b). For each protein, strains from nine hosts were analyzed. For the capsid protein, a GC content range of 40–44% was observed for strains detected from gorilla, horse, porcine, rabbit, human, and dromedary. Otarine, turkey, and fox strains have a higher GC content of 47.5, 49.5, and 50.5%, respectively. The RdRp of strains detected from gorilla, horse, dromedary, monkey, human, and bovine has a GC content between 41 and 45%, while a higher GC content was observed for turkey (47%), otarine (48%), and porcine (46%) strains (Supplemental Fig. 2a,b).

The Seg 1 also encodes a hypothetical protein. In the gorilla viromes, the predicted hypothetical proteins were variable in size (66–293 aa) with only 5/15 variants having proteins above 150 aa. A similar observation was done with strains detected from other hosts. For example, strains detected from rabbit and humans also have smaller protein (approximately 100 aa), when compared to strains detected from horses (151aa) (Fig. 3a, b). The hypothetical protein has short repetitive motifs (ExxRxNxxxE) in all the strains, as demonstrated previously in strains detected from human and rabbit [54]. Previous reports have pointed out that short linear motifs can have several functions such as protein degradation, cell signaling, immune response, cell cycle, transcriptional regulation, or translation [54]. However, in that case, the functional role of these motifs remains unknown and further study will be necessary to decipher their biological function.

This study showed that the proportion of intrinsically disordered protein regions (IDPR) in GPbvs was about 15–68% in the hypothetical protein and 14–27% in the capsid protein. IDPR has been previously observed for viral proteins [55] and are important in regulating cellular, viral functions [56] and help viruses to highjack various pathways of the host cells. The percentage of IDPR in viral proteins is largely variable, from 7.3% for the human coronavirus NL63 to 77.3% for the avian carcinoma virus [55]. In this study, disorder region was particularly detected in the N-terminal region of the capsid proteins. Similar observation was also made by another research team; that is, the C-terminal of nodaviral protein A is a disordered region and it has a relatively high evolutionary rate that can facilitate the rise of novel functions [56]. This first report about detection of disordered protein in PBVs opens new exploratory ways to understand mechanisms driving virus evolution and virus/host interactions.

Although PBVs are detected from many hosts, their evolutionary relationship and potential origin are still unclear. The PBVs were initially classified under Birnaviridae, as both are bi-segmented viruses and then reclassified into a new viral family [27]. In 2010, Tang et al. demonstrated structural similarities between three partitiviruses and Rabbit picobirnaviruses (RaPBVs). The RaPBV shares many similarities with partitiviruses in various structural (capsid) properties, including the capsid diameter and thickness [57]. In this study, 63 amino acid sequences (43 from picobirnaviruses and 20 from partitivirus) were analyzed and the presence of a few conserved amino acids and motifs (SGxxxT and GDD) in all the strains (except in radish partitivirus) was identified. This finding supports the earlier hypothesis that the partitivirus might have crossed species barrier from fungi to vertebrates and got adapted or is getting adapted to the host they reside [27]. However, as these two motifs are also observed in dicistroviruses, it may also suggest that these motifs are common to RdRp.

Interspecies transmission (defined as the detection of genetically related viruses between species in one territory) was observed with many viruses, including PBVs in the past. Closely related viruses have been detected between species from different geographical origins and at different points in time. In Hungary, closely related PBVs were detected both from porcine and humans [58]. Besides, human homolog strains have been detected from different hosts like equine [59], rodents [40], and macaca [1] and this study also found two variants (GPbv-31 and V-39) which were closely related to human strains. An analysis to detect possible recombinant events by two distinct methods suggests that no apparent recombination events were noticed in GPbvs, although recombination events were reported in some PBVs [1]. Finally, as additional virological surveys using metagenomics will be conducted on other healthy and diseased hosts, more (complete) sequences related to PBVs will be characterized. This will further expand our knowledge on the extent of picobirnavirus diversity and role in pathogenesis.

The order Picornavirales consists of five recognized families and each family has a unique host range: members of the families Dicistroviridae and Iflaviridae infect invertebrates, those of Marnaviridae infect specific strains of algae, those of Picornaviridae mainly infect vertebrates, and those of Secoviridae only infect plants [60]. Due to the increased virological surveys and discovery projects, many picorna-like viruses have been detected. Currently, these viruses have been placed under 43 genus and 5 families by the ICTV (International Committee on Taxonomy of Viruses) (http://www.ictvonline.org/virusTaxonomy.asp). In 2011, Shan et al. reported the detection of picornavirus sequences from stool samples of porcine, which also share high homology with cDNA sequences derived from Ascaris suum (nematode). The viruses were tentatively named as posa (Porcine Stool Associated) viruses [29]. Since then, several studies have reported the identification of ‘posa-like viruses’ from many hosts; human—HUSA virus [30], fish-FISA virus [31], and swine [61] and recently from rat and bats [32]. This group of viruses formed separate clusters in phylogenetic analysis and is proposed as a new viral family in the Picornavirales. The present study has identified posa-like virus sequences from gorilla stools (GOSA), which formed a separate branch in the ‘posa-like virus genetic cluster’ with whom it shared less than 37% of identity suggesting this is a novel member in the proposed new viral family.

Currently the family Dicistroviridae consists of three genera and all the members are known to infect arthropods [33, 62]. Some of these viruses are pathogenic and have devastating economic consequences [63]. For instance, acute bee paralysis virus, Kashmir bee virus, and Israeli acute paralysis virus infect honey bees, whereas aura syndrome virus and mud crab virus infect shrimps [63]. Although it infects only arthropods, in the recent years, many novel dicistroviruses have been detected from different hosts: goose [64], bat [62], fresh water prawn, panda [65], Mud crab, and Griffin. However, as it is the case here for gorilla stool-associated dicistroviruses, those are likely from a dietary origin (arthropod consumption) or reflect the presence of intestinal parasites including nematodes [65].

Discovery studies on wild animals offer several advantages over similar studies performed on captive animals as those may be contaminated with human pathogens and/or pathogens acquired from other animals in the zoos. However, working on wildlife is also a difficult task that often limits the number of samples that can be collected and/or the number of individuals that can be sampled. In this study, we performed a global viral metagenomic analysis on stool samples collected from 17 wild gorillas. Although limited by the small number of samples collected and by the absence of virome profiles by individual gorillas (pooling strategy for sequencing), this study identified sequences related to several novel RNA viruses including sequences related to picobirnaviruses that were abundantly covered in the stool samples. The comprehensive genomic sequence analysis explored their genetic and phylogenetic relationship with other picobirnaviruses. Also, the identification of genetic signatures (repetitive amino acids motifs and disordered regions) in the viral protein will facilitate further studies to understand their role in pathogenesis. The identification of novel sequences related to posa-like virus, dicistrovirus, and partitivirus will increase our knowledge in understanding the diversity of these viral families. In all, this study increases our knowledge about viral families associated with gorilla stools of asymptomatic individuals that could serve as baseline datasets for comparative studies in order to identify unusual viruses (potential causative agents) in diseased animals.