In West and Central Africa, Int-1 genetic differentiation between
A. coluzzii and
A. gambiae is high and comparable to that observed in their sibling
A. arabiensis (Figure
4). Across this wide geographical region, as previously shown [
23], the two species exhibit low nucleotide/haplotype diversity and are strongly segregated based on two main Int-1 haplotypes (
A. coluzzii-M1 and
A. gambiae-S1, separated by a single C-T mutational step at site Int-1
702) and species-specific rare variants stemming from these. To explain this low polymorphism, a selective sweep centered on a favourable variant in a nearby gene was suggested [
22,
23]. Since the VGSC gene includes mutations conferring
kdr resistance to insecticides, a possible hypothesis is that Int-1 haplotypes are in linkage with these strongly selected alleles. Introgression of
kdr alleles and adjacent genomic regions has repeatedly been reported from different geographic areas [
19,
39]. Indeed, whole genome sequence data have recently shown that over 3 Mb has introgressed from
A. gambiae to
A. coluzzii in Ghana along with the
kdr L1014F mutation [
40]. It is possible that selection on
kdr-associated Int-1 haplotypes may have had a role in reducing diversity, as
kdr-resistance has been reported in some of the
A. gambiae analysed populations particularly from West Africa [
41,
42] [but not in the Far-West region, Pinto
et al., unpublished observations; see discussion below]. However, the close physical proximity of
kdr and Int-1 would make recombination between resistant and susceptible
kdr alleles and their linked Int-1 polymorphisms highly unlikely over a short timescale. Moreover, in
A. coluzzii, as the samples analyzed came from populations where
kdr-alleles were either absent or present at moderate to low frequencies (
i.e. populations from Benin, 40.0%, Nigeria, 19.5% and Cameroon, 6.3% [
42]), and are thus likely to have introgressed very recently [
19,
22,
39]. Recent genomic studies have given additional hints for understanding the reduction in genetic variation at Int-1 and its linkage disequilibrium with markers on the physically unlinked X-centromeric region defining the two species. In fact, as already mentioned, the VGSC gene is located within the chromosome-2 “genomic island” of highest divergence between
A. coluzzii and
A. gambiae. Under the “speciation island” [
10] scenario, it can be hypothesized that a hitchhiking effect on Int-1 has occurred due to diversifying selection on a chromosome-2 “island” gene participating in the building-up of pre-mating barriers, or conferring differential ecological adaptation and niche segregation between
A. gambiae and
A. coluzzii. Alternatively, under the “incidental island” scenario [
2,
13,
16], substitutions at Int-1
702 may have become fixed after species splitting and accumulated little genetic differentiation due to reduced recombination in the chromosome-2 centromeric region. Interestingly, however, the association between chromosome-X and −2 “islands” is neither observed in Rwanda (Additional file
2: Table S2) nor in Tanzania [
41]. In these East African sites, both M1 and S1 haplotypes were found segregating in
A. gambiae populations. Thus, if Int-1
C represents the ancestral allele in the
A. gambiae complex (Figure
1), then Int-1
C/T may be considered an ancestral polymorphism retained in
A. gambiae populations from East Africa (where
A. coluzzii is absent), which became fixed in westward sympatric areas after the splitting of the two species.
In the Far-West region, a strong reduction of inter-specific genetic divergence between
A. gambiae and
A. coluzzii is found - as indicated by the lower F
st observed in this region (0.14) as opposed to the rest of the range (0.51-0.69) (Figure
4) - and a preferential introgression of M1-related Int-1 haplotypes (“typical” of
A. coluzzii) into
A. gambiae is observed. These data are consistent with previous studies showing weak association between chromosome-X and −2 centromeric regions and occurrence of asymmetric introgression from
A. coluzzii into
A. gambiae in the westernmost extreme of their range [
17-
21]. Furthermore, 15 exclusive Far-West haplotypes were inferred through PHASE [
32] and found interspersed and connected to M1 and S1 geographically widespread variants and to the Far-West-specific M3 (Figure
2). The presence of such private haplotypes might indicate that selective pressures on the chromosome-2 centromere observed in Central African populations (Table
1) are relaxed in the Far-West. Note that, although recombination along the centromeric 500-bp Int-1 fragment analyzed would normally be considered minimal, some reduction in accuracy might occur when reconstructing haplotypes using the PHASE algorithm in the Far-West region, where LD along the 2 L-centromere is known to be lower than in the rest of the species range [
19]. However, the PHASE results are supported by summary statistics (Table
1) also indicating extreme Int-1 diversity and recent introgression events in Far-West populations of both species.
There are contrasting possible explanations for the remarkable Int-1 polymorphism observed in the Far-West region. Under the
‘speciation island’ hypothesis [
10] relaxation of diversifying selection on a key isolating trait on chromosome-2 centromeric “island” (of which Int-1 is a part) may have contributed to weaken pre-mating barriers between
A. gambiae and
A. coluzzii and promote a higher rate of gene-flow in the Far-West region. This hypothesis, however, is in contrast with data from other West and Central African areas, where introgression from
A. gambiae to
A. coluzzii of a
kdr-related genomic portion in linkage with Int-1 does not produce an increase in hybridization rates [
21]. Alternatively, the genomic region linked to Int-1 may be not related to speciation [
2,
16] and the observed pattern in the Far-West region could be attributed to a relaxation of purifying selection operating separately within each species on adaptive genetic traits not directly (or only weakly) involved in reproductive isolation. Hence, following this hypothesis, increased Int-1 polymorphism in the “Far-West” region might be the consequence of an increased recombination rate within the 2 L-centromeric “island” (and Int-1) following disruption of linked (background) selection. Resolution of these competing hypotheses requires assessment of the role of hybridization on the extent of linkage disruption throughout the 2 L-“island” and understanding of whether and how this might affect association with traits critical to speciation.
Finally, the frequency and distribution of Int-1 haplotypes within
A. coluzzii across its range provides some hints on further intra-specific geographical patterns (Figure
3). In fact, populations from the West and Far-West regions are characterized by the exclusive presence of haplotype M5, not observed in those from Central Africa. This is consistent with results obtained by other nuclear markers (e.g. microsatellites) showing a macro-geographic subdivision into two distinct West and Central African genetic clusters, corresponding to the forest-savannah biome transition, which may have acted as an ecological barrier to gene flow [
26,
43,
44]. Moreover, the high frequency of the Far-West exclusive M3-haplotype (separated from the major and widespread M1-haplotype by 5 mutational steps) allows speculation that a founder effect (followed by either selection or drift) affected
A. coluzzii populations colonizing this region in the past. This last point merits further investigation through a multi-locus approach at a wider genome scale to shed light on the genetic characteristics of source populations originating the
A. gambiae/
A. coluzzii hybrid zone.