Background
Representing up to 90% of all dermatological tumours to affect horses, the equine sarcoid is the most commonly found skin tumour of horses and donkeys worldwide [
1,
2]. Sarcoids are non-metastasizing, but persistent tumours of fibroblastic origin with a wide range of clinical entities, often occurring simultaneously within the same individual [
3]. According to their gross morphology, five sarcoid types are described: occult, nodular, verrucous, fibroblastic and mixed, with the latter being a combination of several of these types [
4,
5]. The reason for this unique variety in clinical presentation remains to be elucidated. As more advanced fibroblastic tumours have a less favorable prognosis, it is important to gain more insight into the origin and development of equine sarcoids.
Papillomaviruses are slowly evolving double-stranded DNA viruses, having an estimated substitution rate ranging between 2 × 10
−8 and 5 × 10
−9 substitutions per site per year [
6], known to have species-specific biological characteristics [
7]. They are ubiquitous in a wide range of vertebrate host species, often causing benign papillomas for which subsequent spontaneous regression is observed [
8]. Members of the family
Papillomaviridae typically share a similar genome organization, characterized by a double-stranded DNA genome of approximately 8 kbp containing a non-coding long control region (LCR) and eight open reading frames (ORFs). These ORFs encode six early proteins (E1–E2, E4–E7) and two late proteins (L1, L2). The transcription of early genes is responsible for episomal genome maintenance, regulation of cell growth and cell transformation [
9‐
11]. Some RNA molecules containing late transcripts have been demonstrated within sarcoids [
12,
13], yet it remains unclear whether this culminates in the production of infectious virions.
Uniquely, equine sarcoids are the result of natural cross-species infection by BPV-1, BPV-2 or BPV-13, classified in the genus
Deltapapillomavirus. Co-infection with BPV-1 and BPV-2 in the same lesion has been observed [
14], but is rather exceptional. The association between these genotypes and the etiology of equine sarcoids is well documented, although the mechanism of cross-species infection and the host-dependent clinical outcome of BPV infection, i.e. regressive fibropapillomas in cattle compared to equine sarcoids in horses, has not been elucidated. Since the earliest report of an equine sarcoid in 1936 [
15], many studies have demonstrated the presence of BPV DNA, mRNA and proteins in virtually all equine sarcoids, making BPV the main extrinsic factor responsible for the development of sarcoid lesions [
16‐
20]. Unlike the situation in the natural bovine host, a largely unknown mechanism results in sarcoid growth characterized by persistency and frequent recurrence following treatment [
21].
The identification of a set of intra-type sequence variants in selected regions of sarcoid-sourced BPV has fueled suspicions of the existence of equine-adapted viral subspecies that might be favored within the equine host [
20,
22,
23]. Moreover, the description of sarcoid dissemination within isolated populations [
24,
25], further supports this hypothesis and suggests that these subspecies could be maintained within the equine population.
Collected data from previous literature indicate that sequence changes could affect the expression and function of viral proteins. This applies to both HPV-16 E6 proteins [
26] and sequence variants in the LCR and the E2-ORF of BPV-1 isolated from equine sarcoids [
27]. The functional significance of sequence variation suggests that BPV variation could alter biological properties and potentially represent an additional risk factor for more aggressive clinical behavior. In this context, the main objective of this report was to introduce nanopore sequencing technology to sarcoid research in order to provide an extensive full-length genomic characterisation of sarcoid-derived BPV-1/-2. For selected regions, like E5 and LCR, targeted in earlier reports, we sought to address the presence of mutations previously described as ‘potentially sarcoid-associated’ by other authors. However, as it is not known in which genomic fragments functionally significant mutations occur, we also aimed to identify new mutations in unexpected genetic regions. Therefore, we optimized a third-generation nanopore sequencing approach, which allows the simultaneous whole genome sequencing of multiple sequence variants originating from a single clinical sample. The sample set includes specimens of all different clinical types to elucidate if sequence variants contribute to the unique clinical presentation of the equine sarcoid. This would be analogous with the findings of Kurvinen et al. [
28], who described an association between HPV intratypic variants and increased aggressive clinical behavior.
Discussion
In Europe, sarcoids are mainly caused by BPV-1, with BPV-2 being detected in only 10% of lesions [
17,
22]. The most abundant BPV type identified in the analyzed samples was type 1, confirming the main role of this type in the etiology and pathogenesis of equine sarcoids in Europe [
17,
31‐
33]. Only five out of 53 sequences were identified as BPV-2. Due to the low sample load for this type, no conclusion could be made regarding sequence variation for sarcoid-derived BPV-2 sequences. On the other hand, by significantly expanding the number of available type 1 whole genome sequences, we managed to identify substantial genetic variation among isolated BPV-1. Whereas earlier genetic studies mainly focused on the ORF of a confined set of genes, our optimized next-generation sequencing (NGS) protocol facilitates the discovery of nucleotide changes in unexpected genetic regions, such as non-coding promotor regions. Furthermore, the use of a high-fidelity polymerase for the generation of the amplicon library minimizes the risk of observed mutations being attributable to amplification errors, although their occurrence cannot be entirely ruled out. Knowledge of sequence variation in these regions, particularly the LCR, is important because they may have an impact on the transcriptional efficiency of the virus. By introducing such progressive methodology in equine sarcoid research, a significant amount of full-length sarcoid-sourced BPV sequences could be generated. Therefore, this study provides valuable information for future research regarding the biology of cross-species BPV infection and its association with the equine sarcoid. By generating a significant amount of whole genome sequence information for a variety of clinical samples, originating from different tumour types and/or host species from three different geographical areas, our customized NGS protocol proved to be applicable for a highly diverse sample collection. Moreover, our approach of genomic characterisation allows us to generate multiple whole genomes from a clinical sample, which was not possible so far. The limited geographical area needs to be pointed out as a limitation of this study. Therefore, analysing samples from other continents could be of interest to verify if the results observed here could be extrapolated to sarcoids worldwide.
In the current study, we detected 33 type-1 substitutions in equine derived samples resulting in an amino acid change. The possible impact of these amino acid changes upon protein structure and function remains to be elucidated. A fraction of the multiple single-nucleotide polymorphisms (SNPs) was present in all the samples, in contrast to the SNPs that were only found in several samples.
BPV variation ranged from minor amino acid substitutions to notable sequence deletions. In one sample, we identified a deletion of 169 nucleotides (nt. 7435–7604) situated in the non-coding LCR region. Further variation in this region included other smaller deletions and a multitude of substitutions. Interestingly, three single non-coding nucleotide substitutions were located within different E2 binding sites (BS). While these findings are in accordance with the results of [
27] that showed identical SNPs G7595T and A7598C in BS6, the SNP detected in BS8 (G7642A) differed by 2 nucleotides compared with our results (G7644A). It has been documented that several nucleotide changes within the LCR are enough to increase transcriptional activity [
34]. Therefore, the identified LCR variants may have functional significance, contributing to the development of sarcoid tumours. In the different BPV genes, we see varying rates of non-synonymous mutations, with especially E2 (83%) showing a common occurrence of such mutations (Additional file
3). Intralesional expression of early BPV-1/-2 regulatory oncoproteins, including the E2 protein, supports the role of BPV in the multifactorial etiology of equine sarcoids. During the early phase of infection, E2 proteins interact with the LCR fragment in the viral genome and regulate transcription of other early and the late genes [
19,
35,
36]. Since we found such a high fraction of non-synonymous E2 mutations in combination with variation in different E2 BS, further studies to explore the role of E2 variants in sarcoid pathogenesis and cross-species BPV transmission seem to be worthwhile. In this context, equine-associated BPV-1 LCR variants have previously shown functional significance in vitro due to higher transcriptional activity in equine cells, suggesting that these BPV variants have an enhanced function in the equine host [
27].
Genomic studies of equine-associated BPV have been reported worldwide [
19,
22,
23,
37]. However, these have been restricted almost exclusively to partial sequencing of selected genetic regions, with the LCR and E5 being the most extensively studied regions. Conversely, complete sarcoid-derived BPV genomes have rarely been sequenced, in part due to technical limitations. Sequence variation was previously detected in the ORF of E2, the LCR [
27,
38] and the E5 ORF [
19,
22,
23]. Interestingly, Federica et al. [
23] identified both mutant and reference E5 in the same subclinically infected horses, while sarcoid-bearing horses were only infected by virus containing mutant E5 DNA. Accordingly, the vast majority (87%) of the sarcoid-sourced BPV-1 in this study contained mutations in the E5 ORF. Nevertheless, reference E5 was also present in six sarcoid samples. One of these samples (Additional file
1: batch 1, sample 3) contained multiple sequence variants: 1R3b with reference and 1R3a with mutant E5 (Additional file
3). In terms of the LCR region, one particular LCR variant (SV20) could only be found in equine samples and in none of the thirty bovine samples analyzed by Nasir et al. [
27] and Trewby et al. [
38]. The equine and bovine samples in our sample set showed some overlap in sequence variation. However, our bovine sample set is too limited to exclude possible host-specific variation.
The E5 nucleotide region is an important region because it codes for the major viral oncoprotein, which has transforming capacity in equine cells [
24]. Regarding the E5 nucleotide region in the analyzed samples in the present study, the glutamine at residue 17 was constant in all of the variant E5 predicted protein sequences. Since the integrity of this residue is crucial for inducing cell transformation [
39,
40], the function of the described E5 variants seems to be preserved. Fourteen out of 39 BPV-1 E5 variants contained the same nucleotide substitution G3920T (gene position 43). Interestingly, this substitution was found in samples from sarcoid-bearing horses in both Belgium and neighboring countries (Luxembourg and France), indicating that this variant is currently prevalent in Western Europe. The G/T substitution at position 43 leads to the change of an alanine with a serine amino acid residue. The two other substitutions T3886C (gene position 9) and A3937G (gene position 60) did not alter the deduced amino acid sequence. All of these E5 mutations were previously described as potential equine adapted BPV strains by other authors [
20,
22], with the A/G substitution at position 60 being constantly identified in sarcoid-bearing horses by Federica et al. [
23]. The impact of these mutations remains unclear.
Considering the overall distribution of deletions in our sample set, it is remarkable that the majority of them cluster within the region coding for late viral genes. Together with the extensiveness (up to 603 nucleotides) of the described deletions, this suggests an altered function of L1/L2 in disease pathogenesis. In the past, it was believed that late viral genes were not transcribed in sarcoid lesions. Nevertheless, Wilson et al. [
19] detected L2 transcripts in cDNA samples from 6 sarcoids and L1 protein has been shown to exist in association with viral DNA in some sarcoid tissues [
20]. Although late gene fragments belong to the most conserved regions among the genus
Deltapapillomavirus, minor adjustments in their nucleotide sequence may generate a shift in protein function or conformational changes in protein structure. Hereby, external epitopes functioning as immunological binding sites may possibly become incompatible with neutralizing antibodies. This could explain the contrast between the spontaneous regression of BPV induced papillomas in the authentic, bovine host and the persistent and recurrent character of equine sarcoid lesions. After all, neutralizing antibodies are considered the main protection factor against experimental and natural infection [
41]. However, the available evidence in favor of a full productive life cycle in the equine host is very limited. In this context, the identification of L1/L2 alterations of such a substantial proportion (up to 603 nucleotides) in our sample set leads us to believe that late viral gene function of these sarcoid-sourced BPV variants seems unlikely. The existence of sarcoid-derived BPV variants with loss of late viral gene function may be the result of the inability of BPV to support the vegetative portion of the viral life cycle in the horse population. Interestingly, the deletion of residue 93, 94, 95 and 414 within the L2 ORF described by Wilson et al. [
19] is almost identical with the L2 deletion (nt. 4463–4472; nt. 5424–5427) present within the L2 ORF of the BPV1 sequences in our data set. This was accounted for by several extra missing bases, which caused the deletion of two extra amino acids, i.e. residue 96 (glycine) and residue 413 (tyrosine). The removal of the extra residue 96 in our samples, above the dismissal of residues 93–95, generates a novel heptamer motif (GSRATRT). These results contradict the motif GSRAGTR being a widespread motif within equine sarcoid-associated BPV, as proposed by Wilson et al. [
19].
Interestingly, several of the major deletions in the BPV-1 variants were only present in part of the sequencing data, and the non-deleted variant could be detected in parallel. Concerning the detection of multiple BPV-1 variants originating from a single tumour, there are two possible scenarios: a simultaneous infection with different subtypes, or the virus can mutate at a high rate to adjust to its environment. Additional file
3 shows that no other mutations were observed at comparable frequencies in these samples, suggesting that the deletions are likely somatically acquired and not indicative of a mixed infection with multiple strains. However, it remains to be determined how this acquisition would have occurred and whether these mutants are still functional genomes or merely defective copies that nonetheless seem to be getting replicated. In the same way, our results show the presence of multiple intratypic BPV variants in sarcoids of different clinical types residing within the same horse. In contrast to HPV intratypic variants, no correlation could be observed between BPV subtype and disease severity, reflected in the clinical presentation of the equine sarcoid (Fig.
2, Additional file
1).
A previously published research paper concerning the epidemiology of sarcoids in donkeys provided supporting evidence for the concept of sarcoid transmission between equids [
24,
42]. In this context, the hypothesis that allows viral spread among the horse population in the absence of an obvious bovine source is further strengthened by the identification of intratypic sequence variation in the E2-binding region of sarcoid-associated BPV [
19,
22,
23,
27,
38]. Some of these earlier described variants, located within the E2-binding LCR regions, were also found in our sample set. Likewise, the entire sequence variation we detected in the E5 ORF is in compliance with the results of sarcoid-sourced samples sequenced in earlier reports. Whether the detected sequence variants are equine-adapted BPV strains able to circulate within the horse population remains uncertain. Nevertheless, the extensive somatic deletions we described in the late region of BPV-1 originating from sarcoid lesions could indicate that the second stage of transcription is not conducted in equine cells.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.