Background
Worldwide, cervical cancer is the third most common cancer in women, with 86% of cases and 88% of deaths occurring in developing countries [
1]. In South Africa the incidence of HPV infection is high [
2] resulting in cervical cancer being the leading cause of cancer-related death in women [
1].
It has been conclusively established that infection with specific high-risk human papillomaviruses (HPV) is causally linked to the development of cervical cancer [
3]. Other types of HPV are also aetiologically associated with anogenital warts [
4]. Papillomaviruses (family
Papillomaviridae) are small DNA viruses with double-stranded circular genomes that infect the cutaneous and mucosal epithelia. More than 200 different HPV types have been identified, with viruses in a type sharing greater than 90% sequence identity in the
L1 major capsid gene [
5]. To date, full genome sequences are available for 118 HPV types. At least 40 HPV types infect the anogenital mucosa, with 12 of these classified by the International Agency for Research on Cancer (IARC) as carcinogenic to humans (Group 1), one as probably carcinogenic (Group 2A) and 12 as possibly carcinogenic (Group 2B) [
6]. HPV-16 and HPV-18 are generally recognized as the most important oncogenic viruses, present in about 71% of cervical cancer cases worldwide [
7]. However, several studies (for example [
2,
8,
9]) have found significant variation in the regional contribution of HPV types to cervical cancer. Additionally, the largest worldwide HPV genotype distribution study carried out to date showed that the highest proportion of multiple HPV infections and infection with undetermined HPV types and species occurs in Africa [
7]. Data on HPVs that are regionally prevalent are crucial in determining the risks associated with particular HPV types, and in informing vaccine strategies in the region. There is limited information of this kind available for South Africa.
South Africa is also faced with one of the worst Human immunodeficiency virus (HIV) epidemics in the world, with an estimated 5.63 million infected people and a higher HIV prevalence in women than men [
10]. Consequently, the frequency of concurrent HPV and HIV infection is high [
11‐
13]. HIV positive women are at an increased risk for the development of cervical disease [
11]. These women have higher HPV viral loads, higher viral persistence and infection with rare and undetermined types [
13,
14]. Further epidemiological data for these types is required to estimate their potential oncogenic risk. HIV positive women also have a high incidence of multiple infections [
14], with an observed frequency of between 50 to 80% [
13,
15,
16]. In HIV negative populations the observed frequency is less, although higher than previously thought, at between 24.8 to 62.6% [
17]. The elevated incidence of multiple HPV infections in HIV + women raises concerns over possible recombination between different genotypes and the emergence of novel pathogenic types. For example, Jiang et al., [
18] recently demonstrated intratypic recombination between HPV-16 variants in a natural coinfection involving eight HPV-16 types.
Molecular diagnostic tools for HPV DNA detection in clinical samples must be able to accurately detect and genotype the specific HPV types circulating in a particular population. The majority of HPV DNA detection kits are PCR-based, targeted to known HPV types and generally to those prevalent in the developed world. These tests are therefore not suitable for the detection of rare or novel HPV types. PCR-based assays using consensus primers are, additionally, often not able to reliably detect all the HPV types involved in multiple infections [
19‐
21], as commonly seen in HIV infected individuals.
The recent emergence of next-generation sequencing (NGS) technologies has opened up the opportunity to directly examine viral diversity in clinical specimens, without prior sequence information (reviewed in [
22]. In this study we investigated the use of Illumina sequencing (sequencing by synthesis technology,
http://www.illumina.com/systems/genome_analyzer_iix/technology.ilmn) to detect and genotype the HPV types present in a complex multiple infection in a cervical specimen from an HIV-infected South African woman. The HPV types detected using Illumina sequencing were compared to those detected by the Roche Linear Array HPV genotyping (LA) test on the same specimen. The prevalence of the HPV types present, but not included in the commercial kit, was also determined by type-specific PCR in the cohort of 109 HIV-infected South African women described by Moodley and co-workers [
13].
Discussion
Several studies have indicated that many current HPV typing methods are not able to reliably identify all types present in complex multiple infections [
19,
21]. In the WHO HPV LabNet Global genotyping proficiency study most labs (90%) were able to identify HPV-16 and 18 as individual types but less than 80% were able to identify HPV 56, 59, and 68. Of more concern was that 28/84 data sets reported false positive results. A notable decrease was observed in the performance of most assays in identifying types when present in multiple infections and only 50 to 73% of the data sets generated by these assays correctly detected the types present [
19]. This suggests the need for further assessment of the tests used and regular participation in proficiency testing to ensure the quality of data. It also suggests that epidemiological data may not be completely accurate and result in detection biases.
Recently a study was published using 454 NGS technology and HPV specific primers targeting the conserved L1 gene. A good correlation was reported between INNO-LiPA HPV Genotyping Extra assay (Innogenetics, Gent, Belgium) and NGS data but the NGS had a lower sensitivity [
25]. This study differs from our NGS study in that our study does not use specific primers, had a pre-amplification enrichment step using RCA as well as using the Illumina GAII system to generate sequence. Of note was that using this methodology there was a greater sensitivity than LA genotyping. This study demonstrates the use of NGS in genome sequencing and genotyping of the HPV types present in a complex multiple infection in an unbiased manner. The study design was a metagenomic-based approach, extracting total DNA from a cervical specimen (HH015), without prior virus purification. Circular DNA present in the sample was enriched using phage phi29 DNA polymerase in a randomly-primed RCA method [
26]. This allowed us to amplify whole HPV genomes in the sample in an unbiased sequence-independent manner, unlike other amplification methods such as PCR. This robust technique has successfully been used for the amplification of a number of circular DNA viruses (reviewed in [
27]), including HPV [
28], and provides ample quantities of sufficiently pure DNA for sequencing. Illumina sequencing was chosen based on the expected high depth of coverage achieved with this technology.
Approximately 20% of the short sequence reads generated by Illumina sequencing of the RCA-enriched DNA from specimen HH015 were identified as HPV sequences (Figure
2). Considering the small size of the HPV genome (8Kb), even in relatively high copy numbers, in relation to the human genome (ca. 3000 Mb), this level of coverage indicates the highly successful amplification or enrichment of the HPV DNA by the RCA technique. Complete or near complete genomes were assembled for five HPV types (30, 39, 40, 16, 56). Both
de novo assembly and reference mapping identified these types as being the most abundant in the sample. We were not able to
de novo assemble full genomes for the less abundant types in the sample, and instead used reference mapping to identify all the types present.
As the HH015 sample contains a mixture of HPV types, reference mapping may be problematic. Short reads sequenced from one type may map to regions of high identity in the genomes of other types present. This could lead to an under or over-estimation of a types’ abundance, or worse, a false positive. This would be particularly dependent on the presence of other highly related types. This is well illustrated when comparing the read count and coverage obtained when we performed read mappings to HPV genomes individually and simultaneously (Table
2). Visual inspection of the read coverage for different regions within the genomes showed unequal read counts. This has, however, been observed in many genome sequencing projects where read coverage is known to be influenced by a number of factors. To overcome these problems, we performed stringent mappings to a highly variable sub-genomic region, the LCR. A consistent read count was obtained for each type whether the mapping was performed individually with a particular HPV type or simultaneously against all types (Table
3). This then allowed for a greater degree of confidence in identifying HPV types that were less abundant. Further support for this, was our finding that the percentage of the genomes or LCRs sequenced did not differ significantly when reads were mapped individually or simultaneously. The relative coverage obtained for the HPV types should reflect the relative viral loads of each type in the specimen. This is assuming that RCA of different types was equally efficient and did not have any amplification biases. Based on the mapping of sequence to the LCR region the type with the highest copy number was HPV-39 followed by HPV types 16, 40, 56, 74, 30, 71, 70, 35, 45, 59, 90, 55, 86, 81, and then finally HPV type 53 as having the lowest copy number.
Roche LA testing of DNA extracted from HH015 identified 12 HPV types (16, 39, 40, 45, 52, 53, 55, 59, 70, 71, 81 and 84). Illumina sequencing could reliably detect 16 HPV types (39, 16, 40, 56, 74, 30, 71, 70, 35, 45, 59, 90, 55, 86, 81, 53), based on de novo assembly and reference mapping to HPV genomes and LCR sequences. Both Illumina and Roche LA therefore detected HPV types 39, 16, 40, 45, 53, 55, 59, 70, 71, 74, 81. LA detected HPV-84 and −52 which were not detected by Illumina sequencing. Illumina sequencing identified an additional 6 types not detected by LA; HPV types 30, 35, 56, 74, 86 and 90. The HR types 35 and 56 are included in the LA, while HPV types 30, 74, 86 and 90, are not.
Illumina sequencing covered 88.6% of the complete HPV-35 genome and approximately 85% of the HPV-35 LCR with 99.6% identity to the reference sequence. No reads mapping to HPV-52 were identified. In the LA the probe for HPV-52 can cross-react with HPV-35, -33 and −58. A separate probe for HPV-35 is included in the LA, but was negative for HH015. This may be due to a low viral load in the specimen; however it may also be that HPV-35 was mistyped as HPV-52 in the LA result. In the WHO HPV genotyping global proficiency study, LA testing was found to frequently give false-positive results for HPV-52. As no reads mapped to the HPV-84 LCR the presence of this type, detected by LA, could not be confirmed by Illumina sequencing. A type-specific PCR using HPV-84 specific primers was unable to detect HPV-84 in specimen HH015 (results not shown). HPV-84 may have been mistyped in the Roche LA result.
The HR HPV-56 was identified by Illumina sequencing as one of the dominant HPV types in HH015. The complete genome was assembled at a coverage of 93.9 (Table
1) and mapped at a coverage of 97.4. This type was not detected by LA: this is probably because the limit of detection for HPV-56 in the LA is very high. Eklund et al. [
19] report that this type, and HPV-52, are frequently undetected in many HPV genotyping assays, including LA, and their prevalence is probably underestimated in epidemiological studies. This is especially when compared to HPV-16 and −18 prevalence’s, for which most assays have a significantly lower detection limit.
Illumina sequencing identified several HPV types in HH015 not included in the LA (HPV types 30, 74, 86 and 90). HPV-30 has been classified as possibly carcinogenic [
6]. While the remaining types are not classified as HR oncogenic types, we wanted to know the frequency of these types in our study population to determine if they were common. HPV-30 (14.6%) and HPV-74 (12.8%) were found to be the third and fourth most common low risk types in our cohort, after HPV-62 (23.9%) and HPV-70 (15.6%) [
13]. The prevalence of HPV-86 and 90 was 4.6 and 8.3%, respectively. Although our study population was small (109), the high prevalence of HPV-30 and −74 may warrant their inclusion in future HPV genotyping studies.
Inclusion of HPV types 30, 74, 86 and 90 into our previously reported HPV prevalence data for this cohort [
13], showed that only 9.2% of the women had no HPV (10/109), 18.3% had single HPV infection (20/109) and 72.5% had multiple infection (79/109). The high prevalence of multiple HPV infection in this cohort (72.5%) and small sample size limited our ability to assess the impact of the individual HPV types 30, 74, 86 and 90 on the cervical cytology results.
Competing interests
The authors have no competing interests to declare.
Authors’ contributions
EPR and ALW initiated the study and participated in supervising all aspects of the study as well as in editing the manuscript. TLM analyzed the data and drafted the manuscript. ATS and TLM performed the experiments. BC, HJM and MJF participated in the Illumina sequence data analysis. JM arranged for collection of the clinical specimens. IIH co-supervised the work done by ATS. All authors read and approved the manuscript.