Background
Human metapneumovirus (HMPV), a close relative of respiratory syncytial virus (RSV), is a recognized major human pathogen that causes epidemics of respiratory tract illnesses in persons of all ages worldwide [
1,
2]. Discovered in 2001 [
1], HMPV was probably circulating for at least 50 years prior to this date [
1]. Infection with HMPV may manifest as upper or lower tract respiratory illness, similar to that observed with RSV disease although HMPV has a considerably lower individual disease risk and population burden than RSV [
3,
4].
A member of the
Paramyxoviridae family of viruses, HMPV genome is a negative-sense single-stranded RNA molecule, 13.3 Kb long, encoding eight proteins [
5]. Three surface proteins F (fusion), G (attachment glycoprotein) and SH (small hydrophobic) are encoded within the HMPV genome [
6] and F and G nucleotide sequences have been largely used to study HMPV genetic variation [
7]. Whilst the G gene shows higher sequence and amino acid diversity [
8‐
11], only the F protein is confirmed to be immunogenic and protective [
6,
12].
In the northern hemisphere peak HMPV disease occurrence is typically in winter and spring months of January to May [
13‐
15], while in the southern hemisphere peak prevalence is in the spring period of August to September [
16]. In Kenya, peak HMPV prevalence has been recorded in June-July in the west and November-December in refugee camps in the northeast and northwest of the country [
17,
18].
Worldwide, HMPV prevalence in hospital inpatient or community studies, in children or elderly adults, varies widely from as low as 1.7 % to as high as 17 %, with generally higher prevalence in outpatients compared to inpatients and, also, more in children younger than 5 years compared to older age groups [
7,
13,
14,
16,
19‐
26]. Studies in Kenya report HMPV prevalence between 3 and 6 % in acute respiratory infection cases in inpatient populations [
17,
27‐
29], and 7 to 8.6 % in outpatient [
17,
30] but none have provided information on virus genetic characteristics and underlying evolutionary changes over successive epidemic seasons.
HMPV has been divided into two serologically distinct groups, A and B [
1,
31]. Group A generally dominates over group B [
7,
24‐
26,
32] and has been reported to cause more severe disease than group B [
33]. The two groups are further subdivided into subgroups A1, A2, B1 and B2 based on genetic differences in the surface proteins F and G but these do not show clear antigenic differences at least in neutralization assays using anti-sera raised in ferrets [
8]. The A2 subgroup is the most genetically heterogeneous of the four subgroups and some studies have suggested its further sub-division into A2a and A2b sub-lineages based on sequence data [
34,
35]. Based on the F gene, HMPV groups A and B have 84–86 % homology at nucleotide level and 94–97 % at amino acid level whilst within subgroup similarity is 94–96 % at nucleotide and 97–99 % at amino acid levels [
8]. In comparison, the more diverse G protein shows only 50–57 % and 30–37 % similarity for nucleotide and amino acid sequences, respectively, between the two groups A and B [
8]. Variants from both groups, and sometimes from multiple subgroups within the groups, can co-circulate in the same epidemic season [
8,
12,
14,
35,
36].
Candidate vaccines targeting the G protein and a subunit vaccine of the F protein have shown promising results although, to date, none is licensed [
37‐
40]. We set out to understand the genetic diversity in the F and G genes in circulating strains in coastal Kenya in relation to seasonal introductions of the virus, to contribute information that may be important for vaccine development and virus infection control. We describe the molecular epidemiology of HMPV in child admissions at a coastal county hospital of Kenya, over a 5-year period, building on previous work in the hospital [
27] in order to elucidate prevalence, circulating strains and genetic diversity in the most at risk paediatric population, contributing information on HMPV persistence and transmission.
Discussion
The epidemiological and evolutionary patterns of circulating strains of HMPV remains poorly documented in most of sub-Saharan Africa. Using an existing framework for childhood pneumonia surveillance at a referral hospital in coastal Kenya we set out to describe HMPV epidemiology as well as its genetic diversity in this region and compared findings to global contemporaneous strains deposited into GenBank.
We found that 4.8 % of childhood pneumonia hospital admissions for the period 2007 to 2011 (inclusive) in Kilifi County Hospital were HMPV positive. Our results fall in the range 3.8 to 15 % [
13‐
15,
20,
23,
48] reported in pediatric hospital admissions in other parts of the world. A previous study in Kenya (albeit in a refugee population) identified HMPV prevalence of 5.7 % [
17].
The HMPV infections in KCH admissions were most common in children <6 months (44 %), and 74 % of all HMPV cases occurred in children under 12 months of age, with 84 % of cases presenting with symptoms of severe pneumonia. Substantial disease burden associated with HMPV in the first year of life has been previously reported [
7,
13,
16,
22,
49], highlighting the most affected age group and providing a guide on the populations to prioritize in future HMPV vaccine administration.
A seasonal pattern to HPMV positive samples was identified from October of 1 year to April of the next, corresponding to higher temperatures and lower rainfall. This is similar to the seasonal pattern of RSV at the same site [
49,
50]. In Dadaab, a refugee camp 500Km north of Kilifi, peak HMPV prevalence occurs in December [
17], similar to Kilifi. In other parts of the world, seasonality in HMPV prevalence has been previously reported [
14,
15,
17] with peak prevalence either coinciding with the winter season, concurrent or after the RSV epidemic season [
13,
15,
20]; alternating between winter and spring [
51] or peaking in the late spring-summer months [
52] in the northern hemisphere whilst studies in Australia show peak seasons in spring [
16] which is concurrent with the RSV peak season. In 2010, no HMPV was detected between April and September. Studies in Europe have similarly shown HMPV prevalence varies from year to year [
51,
52].
Three HMPV subgroups A2, B1, B2 were found in Kilifi during the study period; A1 was absent but A2 and B2 occurred over the whole surveillance period whilst B1 was only recorded in the 2007 and 2008 in low numbers. All the samples sequenced for G gene were A2, reflecting the fact that A2 was the predominant subtype in Kilifi in every season/year of the study. Examination of global HMPV sequences in GenBank for the period covered in our study showed low representation of A1, possibly explaining its absence from Kilifi. Interestingly, studies have shown A1 to be dominant in the USA [
13] and South Africa [
25] and B1 in an Australian study of inpatient admissions of all ages over a 4 year period [
16]. Genotype B1 was only detected in 2007 and 2008, and undetectable in the remaining 3 years. Our analysis of comparison sequence data in public databases of the same period showed that B1 was indeed circulating elsewhere in the 3 years that we did not detect it in Kilifi. A 20-year study in the USA reported sporadic detection of B1 genotype [
53] and may in part explain the intermittent pattern of occurrence that we observe in Kilifi. Studies covering a longer time period may better resolve the pattern of genotype occurrence. The identification of three subgroups of HMPV was possible using only F sequence data and not the G sequence data possibly owing to the larger number of samples successfully sequenced for the F.
Co-circulation of multiple lineages of HMPV has been previously reported [
7,
14,
53,
54]. Furthermore, dominant strains may vary in different seasons and locations [
7,
36,
55] and genotypes may dominate in 1 year then be replaced by another the subsequent year [
16,
30,
53]. In Kilifi, our data showed a contrasting scenario with one subgroup A2 dominant in all seasons, with the other subgroups especially B2 co-circulating in lower numbers. Long-term surveillance to ascertain if there is genotype replacement in the subsequent years after the study will be important in determining genotype patterns in Kilifi. The dissimilarity between the distribution of genotypes in circulation in Kilifi relative to the global pattern, comparing year by year, supports the hypothesis that HMPV migrates across the world at a relatively slower rate compared to other respiratory viruses like Influenza A to allow the existence of localized genotype replacement patterns.
The G gene is the most variable gene in the HMPV genome [
10,
56]. It has been suggested that frequent variation in the G gene may be a strategy to evade the host immune system selective pressure [
57]. We found that the G protein evolutionary rate was three times higher in the region we analyzed compared to the F protein evolutionary rate. Overall the diversity observed in the G protein sequences was far higher as compared to the F protein sequences. Although the all G sequences we obtained were of A2 subgroup, based on the phylogenetic clustering, bootstrap support and amino acid change patterns, we could classify our A2 subgroup viruses into four further clusters.
Amplification and sequencing of the G protein did not succeed for more than half of our HMPV positive samples. About 60 samples failed at the G PCR amplification stage and a further 30 failed at the nucleotide sequencing stage. In the end we only obtained sequences for subgroup A2. It is possible that this was caused by insufficient match of our primers to the circulating variants/subgroups thus impeding amplification and sequencing. This low recovery rate of the HMPV G protein sequences limited our study power to fully understand genotypes and variants that circulated in Kilifi over the study period. An alternative explanation for the PCR/sequencing failures is possible RNA degradation as the study used archived material.
The HMPV F gene was determined to be less diverse which concurs with previous findings and pneumovirus F protein diversity in general [
8]. Furthermore, 52 % (64/123) of the F gene sequences were determined as 100 % identical to a sequences in the remaining set whilst only 5.3 % (3/56) of the G gene sequences were identical to each other. This suggests that to tease out any differences between these sequences that are 100 % F gene identical, sequencing another gene for instance the G gene or even the whole genome will be necessary.
Whilst there were no N-glycosylation sites detected in the F protein sequenced and only few amino acid changes observed, the G protein had more sites where there was loss of N-glycosylation and also had several amino acid changes owing to the more nucleotide sequence diversity observed. The F portion we sequenced does not encompass the three potential N-glycosylation sites at positions 57, 127 and 353 that have been previously described for HMPV [
58].
Many of the characteristic epidemiological and evolutionary patterns observed in this study for HMPV mirror the findings previously reported for RSV from the Kilifi population. For both viruses, the highest disease burden is in the paediatric population occurring during early infancy (though HMPV burden is overall smaller) [
49]. Both RSV and HMPV show an annual seasonal pattern with peak activity months well overlapped [
49,
50] and multiple genotypes occur during epidemics [
50,
59]. Further we show that like for RSV, analysis of the G protein encoding region distinguishes better the variability of strains occurring across epidemics than the F [
60]. Nonetheless, a few differences can still be picked in their patterns. Firstly, group/subgroup temporal dominance or replacement is clearer with RSV than HMPV [
59]. Secondly, the substitution rate observed in HMPV G appeared much higher than estimated for RSV G [
61]. Thirdly, most of the genetic variants in RSV occur over a single epidemic and disappear but for HMPV, variants seem to persist for more than a single epidemic before disappearing [
50]. To provide further new insights, future studies should undertake whole genome study of these viruses and analyze specimens collected over a longer time period and across multiple sites in Kenya and Africa for a better understanding of the transmission, evolution and persistence mechanisms of these important human pathogens.
There are a number of limitations of this study that should be considered when interpreting the results. Fewer G gene sequences were obtained from the study, relative to F, and this could be the reason that only the A2 variant from the worldwide pool was identified. There was a change through time in the selection of samples for testing based on residency status, which would have resulted in a wider catchment area for 2010-11 than earlier. However, the low level of temporal clustering observed suggests the samples to be drawn from a similar pool of variants. As noted in previous reports from this surveillance [
27,
43,
49], collection of nasal specimens from children with life-threatening features is a continual challenge that could bias estimates of prevalence and variant composition.
Acknowledgements
We thank the clinical and laboratory teams for collection and processing of specimens. Specifically, we would like to thank Anne Bett, Caroline Gitahi, Getrude Ndanu, Clement Lewa, Alexander Gichuki and James Kipkoech for screening the samples for respiratory viruses and Alex Mutuku for help with data analysis. We would like to thank James Berkley for allowing us to use samples collected in 2007 and Mwanajuma Ngama for running the study in the wards. We also thank the parents and guardians of the children for accepting to participate in this study. This work was supported by the Wellcome Trust, UK [102975], [084633] and [077092]. This article is published with permission of Director of KEMRI.