Background
Vibrio cholerae is a primary causative agent of life threatening diarrheal disease, cholera. Based on the somatic O antigens, more than two hundred serogroups of
V. cholerae have been identified [
1], among which O1 and O139 are recognized as the two major agents for cholera epidemics.
V. cholerae serogroup O1 has two biotypes and is the causative agent for the previous two cholera pandemics, in which the classical biotype was dominant in the 6th pandemic and the El Tor in the 7th [
2]. In 1992, a new non-O1 strain of
V. cholerae, designated as serogroup O139 was identified in an epidemic cholera in India and Bangladesh [
3,
4]. Since then,
V. cholerae O139 has been frequently isolated in other Asian countries where the cholera epidemics have occurred. In China,
V. cholerae O139 strains are the dominant contributors in cholera and have been continually isolated since it first appeared in 1993 [
5].
Previous studies have identified that the major virulence of
V. cholerae O1/O139 is encoded by a lysogenic bacteriophage (CTX prophage) integrated in the
V. cholerae genome. Many other genetic elements, such as the toxin-linked cryptic (TLC), the RS1 element, and the pre-CTX prophage (VSK), are also known to be adjacent to the CTX prophage [
6]. The CTX prophage in toxigenic
V. cholerae is usually consists of two gene clusters, the core and the RS2 regions, which are functionally different [
7]. The core region includes the
ctxAB genes encoding cholera toxin (CT), and five other genes encoding necessary components for phage morphogenesis. The RS2 region encodes proteins involved in phage replication (RstA), integration (RstB) and regulation of site-specific recombination (RstR). Another noteworthy element in
V. cholerae is the SXT/R391 family integrative conjugative element (ICE) which was first identified in a
V. cholerae O139 clinical isolation in 1993 [
8]. The SXT/R391 ICE in
V. cholerae usually contributes to the resistance phonotype of
V. cholerae, encoding resistance to several antibiotics like sulfamethoxazole and trimethoprim that had previously been used for cholera treatment.
Though great efforts have been made to understand and to control this pathogen in the past, cholera caused by
V. cholerae is still occasionally outbreak in recent years [
9‐
11]. To date, 9 complete and nearly 200 draft genomes of
V. cholerae are accessible in the NCBI genome projects. However, to demonstrate the evolution and the adaption mechanism of this pathogen, detailed analysis of the genomic diversity of new clinical isolations appeared in different areas and time scales is undoubtedly needed. Here, we report the genome sequence of a
V. cholerae O139 strain E306 we recently isolated from a cholera patient in Beijing, China. The genome here will shed light on the understanding of the endemicity of cholera in North China.
Methods
Strain isolation
V. cholerae O139 strain E306 was isolated from the stool sample of a cholera case in Beijing, China, on May 30, 2013. After enrichment by alkaline peptone broth, the strain was identified as O139 serogroup by combining the results of its 16S rRNA gene sequence, serum agglutination test and biochemical reaction (Vitek 2 compact, BioMerieux Corp.). This research was approved by the Research Ethics Committee of the Institute of Microbiology, Chinese Academy of Sciences, and informed consent was obtained from the patient. The strain we reported here is available in The 306th Hospital of PLA, Beijing, China.
Genome sequencing
The whole genome was sequenced using shotgun sequencing strategy on Illumina Genome Analyser platform. DNA Library was constructed by using the TruSeq sample preparation kit according to the manufacturer's instructions. Briefly, genomic DNA was sheared by sonication and was then end repaired. After adapters’ ligation (pair-end) with the TA cloning method, the resulting DNA fragments were size selected on a 2% agarose gel. The final DNA library was produced by PCR amplification of the selected ligation products in length of ~500 bp. DNA library (5 pM) was then loaded onto the sequencing chip; clusters were generated by using the Illumina cluster generation kit. After sequencing, image analysis and base calling were carried out by using the Illumina GA Pipeline software. Finally, a total of 6,112,322 pair-end reads were generated.
Genome assembly and annotation
The pair-end raw sequences were quality filtered by using the DynamicTrim and LengthSort Perl scripts provided in SolexaQA suite [
12]. After filtering, short reads were assembled by using SOAPdenovo (
http://soap.genomics.org.cn) and the gaps were closed by using SOAP GapCloser (
http://soap.genomics.org.cn). Glimmer 3.02 [
13] was used for prediction of open reading frames, while tRNAscan-SE [
14] and RNAmmer [
15] were used for tRNA and rRNA identification, respectively. The genome was further annotated with the help of the RAST program (Rapid Annotation using Subsystem Technology) [
16]. The annotation results were then checked through comparisons with the databases of NCBI-NR (
http://www.ncbi.nlm.nih.gov/), COG [
17], and KEGG [
18]. For searching the antibiotic resistance genes, the protein-coding sequences were further Blast against Antibiotic Resistance Database (ARDB) [
19], using similarity thresholds as recommended in ARDB.
Comparative genomics
For comparative analysis, reference genome sequences of the closest genetic relatives of
V. cholerae O139 strain E306 and representative strains belonging to important serogroups including
V. cholerae O1 biovar El Tor str. N16961 (GenBank accession number AE003852 and AE003853), B33 (ACHZ00000000),
V. cholerae RC9 (ACHX00000000),
V. cholerae MO10 (AAKF03000000),
V. cholerae MJ-1236 (CP001485 and CP001486),
V. cholerae O1 classical O395 (CP000626 and CP000627),
V. cholerae CIRS101 (ACVW00000000),
V. cholerae IEC224 (CP003330 and CP003331), and
V. cholerae O1 str. 2010EL-1786 (CP003069 and CP003070) were downloaded from the NCBI website. Whole-genome alignments and SNP identification were performed by using Progressive Mauve [
20]. Concatenated SNPs in length of 23,648 bp were used to calculate the genetic distances, and a phylogenetic tree was constructed by using the neighbor-joining method in MEGA5 [
21] based on these SNPs. The stability of the phylogenetic relationships was assessed by bootstrapping (1000 replicates). BWA alignment tool [
22] and SAMTools [
23] for SNP calling were also used for confirming the results. The genome similarities based on phylogenomic distances were analyzed using the Gegenees software [
24].
Quality assurance
The genomic DNA used for sequencing was isolated from pure culture of V. cholerae O139 strain E306. The 16S rRNA gene from the draft genome sequence was further confirmed to be 16S rDNA of V. cholerae by BLSAT against the NCBI database. Sequence contamination was also assessed by RAST annotation systems.
Future directions
Compared to the epidemic lineages of V. cholerae serogroup O1, our understanding of the genomic properties and their diversity of V. cholerae serogroup O139 is very limited. In this study, we sequenced the whole genome of a newly isolated strain of V. cholerae O139. This strain, carrying an El Tor-specific RS1 element that was found in V. cholerae O1 serogroup and more antibiotic resistance genes than other sequenced strains, highlights its high ability to adapt to new environments and poses a risk of causing new epidemic cholera. Moreover, the genome here will be of great interests for future V. cholerae comparative genomics.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
YY and FL interpreted the sequencing data and prepared the manuscript. NL, JL and RFZ generated the sequencing data. YFH participated all discussions of data analysis and rewrite the manuscript. YFH, YY, BLZ and YC were involved in overall experimental design. All authors have read the manuscript and approved.