Background
Streptococcus pneumoniae, (or ‘the pneumococcus’), is a Gram-positive, α-hemolytic bacterium which is a cause of significant disease morbidity and mortality worldwide [
1]. Invasive pneumococcal diseases (IPD) causes ~1 million deaths in children less than 5 years old every year [
1] with the highest IPD burden i.e. ~90 % of the total global death toll, occurring in low-income countries such as Sub-Saharan Africa, Latin America and Asia [
1,
2].
Nearly 100 pneumococcal serotypes have been reported to date [
3‐
9], which vary geographically in both prevalence and distribution [
2] and in their propensity to cause IPD [
10]. Serotype 1 ranks as one of the most prevalent serotypes that cause IPD globally although it is rarely isolated from nasopharyngeal carriage in humans [
10,
11], which results in high invasive potential with odds ratio of ~10 for causing IPD relative to carriage [
10]. Longitudinal studies have reported that serotype 1 is carried for a period of ~9 days, which is the second shortest human carriage duration reported to date after serotype 33B [
12]. Pneumococcal isolates undergo genetic recombination where by competent isolates acquire and incorporate external DNA into their chromosomes. This occurs more efficiently during carriage than invasive disease [
13]. The rarity of serotype 1 isolates in carriage has led to the hypothesis it has a lower rate of recombination due to limited opportunities for genetic exchange during carriage [
14]. A recent report has identified the presence of genetic recombination in a global serotype 1 population [
15].
Multilocus sequence typing (MLST) [
16] and phylogenetic analysis of a global collection of serotype 1 isolates, showed that serotype 1 isolates cluster into three distinct clades predominantly associated with continent [
17]. European and American isolates clustered into clades A and C respectively, whilst clade B associated with African isolates. Recently, using a whole genome phylogeny, we have reported an additional clade named clade D, which consisted of isolates from Asia [
15].
Serotype 1 pneumococci cause a high IPD burden especially in Africa [
2] where it is endemic in most countries particularly within the African Meningitis belt [
18]. The high burden led to the incorporation of serotype 1 capsular polysaccharides in the pneumococcal conjugate vaccines (PCV); PCV10 [
19] and PCV13 [
20], which have now been introduced globally. Currently, studies to assess the effectiveness of PCV10 and PCV13 in the African population are underway. In Africa, the majority of the serotype 1 IPD is caused by the ST217 clone (also known as Sweden
1-27 (PMEN27) as defined by the Pneumococcal Molecular Epidemiology Network) [
17]. This lineage accounts for over 95 % of serotype 1 IPD in Malawi and 98 % of such cases in South Africa [
21‐
23] and causes pneumococcal meningitis outbreaks in West Africa [
24,
25].
We have previously described the global population structure of multiple serotype 1 STs [
15]. In the present study, we collected 226 ST217 serotype 1 isolates from multiple African and Asian countries to gain further insights into the biology of the clone. We used whole genome sequencing to determine the population structure, genomic diversity, geographic spread, population size changes through time and identify key pneumococcal virulence genes associated with the ST217 endemic virulent clone. Our findings provide further insights into the biological mechanisms that may have driven the success of the ST217 clone especially in Africa.
Discussion
IPD due to serotype 1 pneumococcus in Sub Saharan Africa is predominantly caused by the endemic ST217 clone. Our pneumococcal population genomic dataset offers a unique opportunity for understanding how this clone has evolved and spread across and outside the continent. Our findings showed evolution of the ST217 clone into geographically distinct lineages with different characteristics. Previous studies have shown that serotype 1 is highly clonal [
17] and exhibits strong phylogeographic structure [
15]. These characteristics may allow for accurate inference of recent spread of the ST217 clone between countries. Our findings show with higher genomic resolution the recent spread of the ST217 clone. We identified both short-range and long-range transmission of the clone within closely located African countries and between West Africa and Southern Africa and between West African and Asia. In most instances we were able to establish the direction of spread of the clone between countries, but in others we were only able to detect potential spread of the ST217 clone without inferring the directionality. Nevertheless, our findings show the intracontinental and intercontinental spread of the ST217 clone with high resolution using whole genome sequencing.
The ST217 SCs restricted to different geographical regions may experience different selective pressures. Our findings showed consistently lower than expected amount of polymorphism under the mutation drift equilibrium. All the lineages showed negative estimates for the Tajima D statistic which suggested genome-wide selective sweeps and potentially population bottlenecks such as those that would be expected to accompany periodic epidemic waves [
66]. Temporal coalescent divergence dating using isolation years and phylogenies revealed recent emergence of the SCs. The typically South Eastern Africa clade (SC3-SEA) and the Southern African clade (SC1-SA) emerged in the 1980s and the early 1990s respectively while the emergence of the West African clade (SC2-WA) could not be fully determined because of the lack of the molecular clock signal or linear evolution with time in this clade. Overall, our findings clearly suggest recent emergence and clonal expansion of the ST217 in SSA countries within the last century. In the South Eastern African clade, the relative genetic diversity or the effective population size (N
eτ) of the lineage, increased rapidly in the 1990s until early 2000s followed by a decline around 2005. The skyline plot suggests that the population then remained stable until the last sampling point in 2010, but this must be treated with caution because of the possibility that in this more recent time period, the relatively scant data leads the analysis to recover the prior [
66]. The decline in the population size in clade SC3-SEA coincided with the previously reported decline in IPD in Malawi after scale-up of anti-retroviral therapy and cotrimoxazole prophylaxis [
67]. In the South African clade, the population size increased slowly from late 1990s and has consistently remained higher than in Malawi and Mozambique from mid-2000s until 2010, possibly reflecting the higher diversity of the South African population. In addition, the observed mutation rates in both clades were similar to previously published estimates which further suggests that differences in recombination rather than mutation rates drive variations in genetic diversity between pneumococcal lineages vary [
68].
Serotype 1 pneumococci are rarely detected in carriage and our study provides an additional explanation for this. We show that certain key pneumococcal loci important for nasopharyngeal colonisation are absent from ST217 isolates including the pneumococcal pilus operon genes,
psrP, iga, zmpB and
zmpC. These genes promote adherence to the epithelial surfaces as adhesins [
62‐
64,
69‐
71]. Of particular interest was the
iga gene, which encodes a protein that cleaves human IgA1 protease and facilitate colonisation [
62]. To determine the distribution of the absent genes in other pneumococcal lineages, we screened the genes in isolates from Massachussetts population [
58]. Overall, both pilus operon genes,
psrP, iga, zmpB and
zmpC were present in between 5 and 20 % of the dataset suggesting that while these genes may indeed promote colonisation, they cannot fully explain serotype 1’s absence in carriage compared to other lineages because they were present in other lineages carried for longer durations than serotype 1. Interestingly, in the Massachusetts dataset, the pilus associated with highly carried lineages such as serogroup 6, 19 F and 35B while
iga associated with lineages containing serotypes such as 11A, 15A,19A and 35 F and non-typeable pneumococci. On the other hand,
psrP was sporadically distributed across various lineages in the Massachusetts dataset but was completely absent in serotype 1. Majority of the virulence genes such as pneumolysin (
ply), choline-binding proteins such as
cbpA, competence genes (
comA-E), and neuraminidases (
nanA and
nanB) and various cell surface antigens such as
psaA-C were ubiquitous in serotype 1 [
53]. This is unsurprising because serotype 1 isolates are well-known for its high invasive capacity and inability to efficiently colonise the nasopharynx and sustain long durations of carriage. Such inability to colonise humans may be primarily a consequence of its polysaccharide capsule characteristics, which elicits phagocytic killing and rapid clearance [
72] but the absence of the genes important for colonisation such as
iga may also contribute to this effect. In turn, this may limit serotype 1’s exposure to potential donors of DNA thus causing the observed lower rates of recombination since recombination occurs efficiently during carriage [
13,
73]. However, to validate this hypothesis, further in vitro and in vivo studies are required to determine the biological functions and pathways affected by the absent genes and the role of the capsule with regards to clearance by the immune system.
Occurrence of recombination events has been previously associated with acquisition of antibiotic resistance as such lower rates of recombination may have consequences in acquisition of the antibiotic resistance genes [
74]. The South African clade (SC1-SA) contained the lowest amount of recombination and showed virtually no acquisition of the antibiotic resistance conferring elements compared to other SCs. This may have driven the lower acquisition of antibiotic resistance conferring mobile genetic elements in this population and lineage as evidenced by the lower (~27 %) resistance to tetracycline and chloramphenicol, which is among the lowest for this clone in Sub Saharan Africa [
75]. On the other hand, no macrolide resistance conferring elements were identified in all the SCs, which is reassuring and suggests that macrolides may still be a preferable choice of treatment for the foreseeable future in patients infected with this clone. Clades SC2-WA, which predominantly comprised of West African isolates and clade SC3-SEA predominantly consisting of isolates from Southern East African countries such as Malawi, showed higher recombination rates and antibiotic resistance particularly for chloramphenicol and tetracycline. Interestingly, the observed higher resistance to tetracycline was much higher than to chloramphenicol in the West African lineage despite the presence of only Tn
5253 element which carries resistance conferring genes for both antibiotics [
76]. This observation was consistent with previous study from the Gambia, which also reported such a discrepancy using in vitro phenotypic data [
77]. Because resistance to both antibiotics was due to the presence of only the Tn
5253 element, this suggested that the chloramphenicol resistant isolates contained a deletion of the chloramphenicol resistance encoding loci. Interestingly, further comparative genomic analysis of the tetracycline resistant but chloramphenicol susceptible isolates revealed that these isolates harbored a defective Tn
5253 element with an intact tetracycline resistance conferring gene (
tetM) and a large genomic deletion (~5Kb) across the pC194 plasmid, which harbors the chloramphenicol resistance conferring gene (
catpC194). We confirmed this deletion by mapping raw reads from the isolates with the putative deletion against an intact Tn
5253 reference sequence. However, Asian isolates in the West African clade, which represented intercontinentally spread isolates from West Africa, contained an intact Tn
5253 element despite having the same genetic background. This further suggests that the deletion of the chloramphenicol resistance encoding loci was restricted to the West African isolates. These findings explain why ST217 isolates from West Africa are more susceptible to chloramphenicol than tetracycline as previously reported [
77].
However, an important yet unanswered question concerns what may have driven the loss of chloramphenicol resistance in the West African isolates but not Asian isolates of the same genetic background. Further analysis of previously published serotype 1 STs [
15] showed that the deletion of the chloramphenicol gene was not restricted to only ST217 clone. Other closely related STs from West Africa such as ST303, which were single locus variants of the ST217 clone also showed widespread deletion of the chloramphenicol resistance conferring loci. This further suggests that the deletion was not recent and if it was recent it would imply that the defective Tn
5253 has spread at a high rate possibly as a consequence of selection in West Africa. During the sampling period, chloramphenicol was widely used in The Gambia [
77] and possibly other West African countries while its use decreased in Southern African countries such as Malawi where all the resistant isolates harbored an intact Tn
5253 element. This may suggest that the observed widespread loss of chloramphenicol resistance mechanism in West African isolates may not be a consequence of low chloramphenicol usage.
Acknowledgements
We would like to thank all the clinical and laboratory staff at all the collaborating institutions, the sequencing teams at the Wellcome Trust Sanger Institute for their expertise. We are also grateful to the Global Pneumococcal Strain Bank (a PATH-funded collaboration between the US Centre for Disease Control and Prevention (CDC), Emory University, and others) for providing additional isolates for the study.
Authors’ contributions
CC, DBE and SDB conceived and designed the study. DBE, WPH and SDB supervised the study. AVG, MA, JMC, BS, GP, PT, LM, KPK, NF and DBE contributed samples. CC carried out the bioinformatics and statistical analyses, drafted the paper and prepared figures. CC, DBE, WPH and SDB interpreted the results and drafted the manuscript. CC, JC, SRH, CPA, AVG, MS, MY, LB, CE, SG, MDP, FY, SO, MA, JMC, BS, RSH, GP, JP, JC, PT, KPK, LM, NF, AK, WPH, DBE and SDB contributed to the discussions, interpretation of the results and commented on the manuscript. All the authors have read and approved the final manuscript.