Introduction

Located at the northernmost bulge of Africa between Europe and the Middle East, Tunisia is a Mediterranean country bounded on the north and the east by an extensive coastline (1300 km).1, 2 From this strategic location, Tunisia has been a crossroads of human migrations since the Paleolithic period, leading to population admixture that has increased during more than 3000 years of history.3 As a result, modern Tunisian populations display high intra- and inter-specific genetic diversity. Although explored by different disciplines, the peopling of North Africa, and Tunisia particularly, remain a subject of continuous debate. Approximately 8000 years before the common era, Tunisia was characterized by the Capsian culture, which developed in situ in the Maghreb region of Northwest Africa and subsequently experienced a Neolithic transition in its later stages.4, 5

In historical times, because of its location on the main maritime roads of the Mediterranean, Tunisia has been settled successively by many diversed populations including Phoenicians, Romans, Vandals and Byzantines. By the end of the seventh century, Muslim armies from the Arabian Peninsula invaded North Africa and reached the region known today as Tunisia. In the late tenth century, Tunisia also experience an important movement of Arab populations, mostly Bedouin.6, 7 Therefore, it is likely that migrations and admixture processes might have played a pivotal role in shaping the peopling of Tunisia, particularly in coastal cities. In addition, because of its numerous small, isolated indigenous populations, Tunisia represents an interesting region to explore inter population relationships. Indeed, European, Near Eastern and sub-Saharan contributions to contemporary Tunisian populations during prehistory and historic times have been diverse.

Today, Tunisia has a population of about 11 millions represented by Berbers, Arabs, Andalusians, Jews, Europeans and people of sub-Saharan origins. Tunisian cosmopolitan populations are situated on coastal locations. One of such city is Sousse, founded in the eleventh century B.C. as Hadrumetum by the Phoenicians it developed into an important center within the Carthaginian dominion. Sousse soon became the most important trading post on the North African coastline. Through history Hadrumetum came under the control of a number major cultures including the Vandals after the fall of the Roman Empire in the sixth century and later by the Byzantines that renamed it Hunerikopolis and Justinianopolis, respectively. The city became one of the most important Byzantine bases in North Africa. In the seventh century, the Arabs conquered the city renaming it Susa and introducing the Islamic religion and Arabic language.8 The city became a prosperous seaport during the Islamic Aghlabid Dynasty, which occupied and controlled Northern Africa for several centuries.

Subsequently, Sousse was invaded by the Normans of Sicily in the twelveth century9 followed by the Spanish. During the sixteenth century, Sousse received additional, but limited, contributions from the Ottoman Turks. Later, the city came under the control of the French, who once again renamed it to its current name of Sousse.10 Considering this complex history, the expectation is that the genetic landscape of Sousse has been shaped by varying degrees of influences from the above listed invaders who occupied the region in different periods of its history. In addition, Sousse is the most ancient settlement in Tunisia with uninterrupted habitation since its foundation by the Phoenicians.11 Towns such as Carthage and Utique, also established by the Phoenicians,12, 13 were destroyed and their population dispersed. Berber groups, on the other hand, have been rather isolated, since their towns and villages were not located on the coast.14 Studies on Tunisian cosmopolitan populations often examine Tunis which was founded later than Sousse and does not provide a prolong continuous antiquity. Hence, the case of the Sousse population represents an unique case study in population genetics. Despite its historical importance and strategic location, only one molecular genetics study has been performed on this anthropological interesting population.15

Over the past decade, the field of human population genetics has expanded our understanding of the origins, evolutionary history and migration patterns of human populations. Y-chromosome polymorphisms are ideally suited for studies of the patrilineal influences in highly admixed groups16, 17 such as the Sousse population of the present study residing in North Africa, a region characterized by a complex history of demographic events. Previous studies of North African populations have uncovered diverse genetic compositions18, 19, 20 as well as a high frequencies of two specific North African haplogroups (E-M81 and E-M78). Our previous investigation based on Y-chromosome polymorphisms21 using different Tunisian ethnic groups (Andalusian, Cosmopolitan Arab and three Berber-speaking groups) indicate high frequency of the autochthonous and most common North African haplogroup E-M81 (71%). The findings published by Ennafaa et al.22 also demonstrate that the E-M81 lineage is the most abundant (48.2%) in all Tunisian collections studied (Berbers from Bou Omran and Bou Saad located in southern Tunisia and Berbers and Arabs from the Jerba Island in South Tunisia). In general, all studies of North African populations18, 19, 20, 23 are in agreement in that the North African-specific lineage E-M81 is frequent in Berber-speaking groups.19, 20, 23 This observation suggests that the Middle Eastern influence during Neolithic and Arab domination had a greater cultural impact rather than a genetic influence in North Africa. In fact, the presence of the Near Eastern lineage J-M267, detected in the Tunisian samples at frequencies ranging from 17 to 30%,21, 22 is consistent with differential levels of paternal gene flow from the Near East that would have accompanied the Islamic invasion and previous dispersals from the Middle East during the Neolithic. In addition, a sub-Sahara African component is also detected in these populations as attested by haplogroups A, B, E-M96, E-M2 and E-M35 (16.3%) resulting, at least partially, from slavery or migration in recent times.

In spite of these studies, the current understanding of Y-chromosome variation in Tunisia is limited compared with what is known of mitochondrial DNA diversity. In addition, coastal regions, such as Sousse, have been underrepresented in most studies of Tunisian genetic variation. Previous studies based on Y-chromosome-single-nucleotide polymorphism (Y-SNP) data from Tunisia21, 22 have focused mainly on the Berber groups and the Andalusians. The present study represents the first one of its kind to examine the coastal population of Sousse involving a considerable sample size and comprehensive phylogenetic analyses.

In this study, we complement our previous Y-chromosome-short tandem repeat (Y-STR) data based on 17 Y-STR loci15 by performing, for the first time, a high-resolution analysis based on 51 Y-SNPs in the Sousse population. The aim of the current investigation is to elucidate the paternal genetic structure throughout Tunisia and provide information on the complex pattern of migration and admixture expected from historical data. Sousse, as the most ancient cosmopolitan region in Tunisia may reveal most of the genetic contributions that have influenced the present day Tunisian populace.

Materials and methods

Population studied

A total of 220 unrelated healthy males from the region of Sousse were examined employing Y-SNPs. The samples constitute a set of that typed for Y-STR,15 and therefore, the same selection criteria and DNA extraction procedure were employed.

Y-SNP genotyping

Fifty-one Y-SNPs markers were genotyped in a hierarchical manner using standard methods, including polymerase chain reaction-restriction length fragment polymorphism, the YAP polymorphic Alu insertion, allele-specific polymerase chain reaction (for primer sequences and polymerase chain reaction conditions see Supplementary Table S6) and direct sequencing.24 Initially, all the samples were analyzed for an insertion polymorphism, the YAP (a Y-specific polymorphic Alu insertion) or PAI, that define E lineages.

For the remaining SNPs, the following typing scheme was used:

  • Samples carrying the Alu insertion were genotyped for the biallelic markers M96, M2, M215, M329, M75, M35, M78, M81 and M123. Samples derived for M81 were tested for M107, M165 and M183, and individuals derived for M78 were tested for V12, M224, V13, V22 and V65.

  • Samples without Alu insertion were hierarchically screened for the SNPs M304, P58 and M410. Next, M367, M368, M369, L147.1, L174, L222.2, L65.2 were typed in samples carrying the P58 mutation and lacking M410. All the individuals derived for the marker M410 were tested for Page 55, M67, M530/L24, M158 and DYS4457.

  • Afterwards, M69, M522, P207, M213, V88, M168, M216, M214, M91, M198, M42, M9, M258, M20, M201, M370, M269, M174 and M184 were screened in the remaining individuals, until all samples were assigned to a specific terminal subhaplogroup.

The phylogenetic relationship of these markers and the haplogroups that they define are indicated in Figure 1. The Y-SNP haplogroup nomenclature follows the recommendation originally proposed by Karafet et al.25 and revised by additional lineage information as established by subsequent scientific teams.26, 27

Figure 1
figure 1

Hierarchical phylogenetic relationships of Y-chromosome haplogroups and their percentages in Sousse.

Statistical analyses

A total of 23 key geographically targeted reference populations previously typed for both Y-STR and Y-SNP markers and published were utilized for the phylogenetic analyses (Supplementary Table S1). They correspond to 17 countries in North Africa, the Near East and Europe as well as some sub-Saharan African countries (Supplementary Figure S1).

Frequency distribution maps for haplogroups (contour map) were obtained using the Surfer v8.07 software (Golden Software, Golden, CO, USA, http://www.goldensoftware.com). Previously published data on African and Arabian populations were included to build the grid (Supplementary Table S2). We employed the ordinary Kriging procedure with default settings for interpolating frequency values. Kriging variance was used to compute the confidence interval limits of the estimated values. Kriged values falling out of the confidence interval limits were not considered in the analyses. To prevent the Kriging interpolation from performing unrealistic gradient estimation, boundaries of large uninhabitable areas were defined. (See Supplementary Figure S1 for the location of the observed values). Phylogenetic relationships among the microsatellite haplotypes belonging to haplogroups E-M81, E-M78, J-M267, J-M172 and R-M207 were inferred through median joining networks using the NETWORK 4.6.0.0 (Fluxus Engineering, Suffolk, UK)28 and Network Publisher version 1.2.0.0 (Fluxus Engineering, www.fluxusengineering.com) softwares, using weighting based on the inverse of the microsatellite variances. The networks generated are based on the following 10 Y-STR loci: DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438 and DYS439. These loci are in common among all relevant reference populations (Supplementary Table S2) included in the analyses. DYS385 a/b were excluded from the analyses since the multiplex amplification reactions do not allow for the discrimination between the two loci. DYS389II was scored by subtracting the DYS389I allele from the total fragment.29 Times to the most common recent ancestor for the most diagnostic Y haplogroups were estimated by calculating the mean STR variance as proposed by Kayser et al.30 using a mean STR mutation rate of 0.000 69 per generation of 25 years.31

Phylogenetic relationships

The genetic relationships among Sousse and the reference populations were analyzed with two approaches. The first approach was based on genetic distances (using Rst) calculated with the Arlequin program v3.5(University of Bern, Bern, Switzerland)32 and displayed in a Multidimensional Scaling (MDS) graph using the STATISTICA 8 package (Statistica, SAS/STAT Software, Cary, NC, USA; http://www.statsoft.com). Genetic distances were estimated at the level of haplotypes based on the ten aforementioned Y-STR loci (DYS19, DYS389I, DYS389b, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438 and DYS439). The second scheme was based on relative haplogroup frequencies normalized within populations, centered, and without variance normalization in order to construct a principal component analysis (PCA) plot. To study the population genetic structure in North African and, specifically, of Tunisian populations, we performed a series of hierarchical analyses of molecular variance pooling populations according to geographical, linguistic and ethnical criteria. Two sets of analyses of molecular variances, one based on Y-STR and the other on Y-SNP data, were performed using the Arlequin software ver. 3.5.32

Results

Y-chromosome lineage diversity

The partitioning of paternal lineages and their frequencies are displayed in a hierarchical phylogeny in Figure 1. The 220 Sousse Y-chromosomes represent 24 different haplogroups, the majority of them belonging to haplogroups E and J that account for 90% of our dataset (E; 56% and J; 34%) (Supplementary Table S3). The predominant E sublineage, also commonly found in other North African populations,18, 19, 20, 21, 22 is E-M81. All E-M81 derived chromosome are in subhaplogroup E-M183 (44.55%). Besides the common E-M81 lineage, traces of other lineages within the major E-M215 haplogroup were detected including E-M35, E-M78 and E-M123. These lineages are also found at various frequencies throughout North and East Africa. Haplogroup E-M78 which has a wide distribution, including Europe, the Near East and North Africa,20 is mostly represented by subhaplogroup E-V65 (4.09%), with the exception of two individuals that belong to subhaplogroups E-V13 and E-V22 (0.45% each). Lineage E-M123, on the other hand, is detected at low frequency (1.82%). It is interesting that the E-M2 clade which is particularly frequent in sub-Saharan Africa33, 34 and present in some North African populations most likely as a result of sub-Saharan migration18, 19, 35, is not detected in the Sousse samples. The second most prevalent major lineage detected in our dataset is J-M304 encompassing 34% of the total male population. Both of its main branches, J-P58 and J-M172, were observed. Haplogroup J-P58, which is thought to have originated in eastern Anatolia,36 is largely represented by subhaplogroup J-L-222.2 (23.64%). Within the J-M172 clade, the majority of males are represented by the J-L24 subclade (5.45%). Furthermore, Y-haplogroup R-M207 is found in 12 Sousse males of which three individuals are either R-M198 (0.45%) or R-V88 (0.91%). Also present were haplogroups G-M201, L-L20, T-M184 and A-M91, although each of these accounted for only 0.45% of the entire sample set.

High resolution contour maps based on frequency distributions of informative haplogroups were performed. The maps indicate distinct west–east genetic clines in North Africa. The frequency of haplogroup E-M81 (Figure 2a), for example, is much higher in northwestern Africa particularly in Tunisian and moderate in the Near East and Europe. On the other hand, the frequencies of haplogroups E-M78 and E-M123 are much higher in northeast Africa exhibiting a focal point of extreme frequencies in Egypt-Palestine (Figures 2b and c, respectively). The J-M267 lineage is prevalent in all North African and Levantine groups (Figure 2d). In North Africa, J-M267 exhibits the highest frequency in Andalusians from Zaghouan. It is also found at relatively high frequency in the Levantine samples. The J-P58 sublineage contour map is depicted in Figure 2e.

Figure 2
figure 2

Y-chromosomal variation in the Iberian, North African and Levant regions. Each panel shows a different haplogroup: E-M81(a), E-M78 (b), E-M123 (c), J-M267 (d) and J-P58 (e). The intensity of shades is proportional to the values of interpolated haplogroup frequencies.

Phylogeographic analyses of prominent haplogroups

To examine Y-chromosomal relationships in more detail, networks analyses were performed utilizing the 10 Y-STR markers common to all populations. The projection illustrating Y-STR haplotype variation identified in 489 Y-chromosomes belonging to the E-M81 haplogroup is presented in Figure 3a. It is interesting to note that the E-M81 mutation was not detected in the Egyptian population analyzed in this study. The star-shaped E1b1b1b-M81 network is centered on the most frequent and widespread haplotype seen in all North African and European populations. In addition, several other shared haplotypes were observed, with the majority of these present in North African populations. The sharing of haplotypes suggests relatedness among the samples. On the other hand, a greater number (51) of North African-specific clades are also noted, further attesting to the high level of diversity observed throughout the region. It is noteworthy that of all Tunisian populations analyzed, Sousse possesses several unique haplotypes (13/27), indicating a high level of microsattelite diversity. The overall haplotype diversity (HD) and the phylogenetic Mean Pairwise Distance (MPD) values within haplogroup E-M81 are 0.8362±0.0151 and 2.1126±1.1799, respectively. The coalescence time estimate for the E-M81 network is 5.7±3.9 kya.

Figure 3
figure 3

Median joining networks based on Y-chromosome-short tandem repeat haplotypes within haplogroups E-M81 (a), E-M78 (b), J-M267 (c), J-M172 (d) and R-207 (e) of North Africans, Middle Easterners, European and sub-Saharan populations. The circle sizes are proportional to the haplotype frequencies. The smallest area is equivalent to one individual. Branch lengths are proportional to the number of mutational steps separating two haplotypes. A full color version of this figure is available at the Journal of Human Genetics journal online.

The topology of the E-M78 network (Figure 3b) exhibits greater diversity compared with that of E-M81. The E-M78 haplogroup is mostly observed in the Near East populations. Noteworthy, with the exception of the Sousse population, this haplogroup was more frequent in northeastern (Libya; 12.7% and Egypt; 33%) compared with northwestern Africa. Also interesting, Sousse, in contrast to the other Tunisian populations, possesses the highest number of Y-STR- E-M78 haplotypes (18/13, respectively), three of them unique and the ten remaining shared among Europeans, Near Easterners and North Africans (Tunisian Andalusians from Zaghouan, Libyans, Moroccans and Egyptians). This haplotype sharing may be indicative of gene flow involving these regions. The diversity values within haplogroup E-M78 are higher than for E-M81 (0.9899±0.0017 and 4.1310±2.03985, for HD and MPD respectively). The age estimation for E1b1b1a-M78 is 12.2±9.2 kya.

The J-M267 network includes 500 haplotypes, most of them are from Near Easterners (Figure 3c). The most abundant haplotype consists of 106 individuals, including Libyans (42), Soussians (21) and Near Easterners (21). The 22 remaining haplotypes include people from Algeria, Morocco and the Tunisian ethnic groups. In this J-M267 projection, extensive haplotype sharing between the Sousse population and different North African and Near Eastern groups is observed as well as the presence of 10 haplotypes unique to Sousse males. Diversity estimates within haplogroup J-M267 are 0.9439±0.0073 and 2.8360±1.4977 for HD and MPD, respectively. The age estimation for J1-M267 is 7.6±5.2 kya.

The J-M172 network (Figure 3d) is mostly represented by Near Eastern samples (73.9%) with a moderate number of European individuals (18.5%) and only a few North African haplotypes (11.7%). Within North African populations, the Sousse population exhibits the highest number of haplotypes (18). A total of 41 haplotypes have been detected in North African populations. The J-M172 topology illustrates limited haplotype sharing, mostly between Near Easterners and Europeans. Furthermore, numerous Near Eastern- and European-specific haplotypes with highly diverse distribution were noted. Diversity estimates within haplogroup J-M172 are 0.9932±0.0011 and 4.6173±2.2714 for HD and MPD, respectively, and its age estimate is 15.8± 7.1 kya.

The network projection for individuals harboring the R-M207 mutation is provided in Figure 3e. Haplogroup R-M207 was frequent in European groups and less abundant in Near Easterns. This clade is sporadically detected in sub-Saharan and in North African populations. With regards to Tunisian collections, R-M207 is found only in Sousse and Andalusians samples. Diversity values within haplogroup R-M207 are 0.9933± 0.0013 and 4.8067±2.3515 for HD and MPD, respectively.

Genetic structure

To explore the genetic relationships among the populations of Sousse, Tunisia as a whole and the 23 other geographically targeted populations obtained from the literature (Supplementary Table S1), we performed a PCA based on haplogroup frequencies (Supplementary Table S2). The first two components of the PCA account for 55.18% of the variation and reveal distinct geographical partitioning (Figure 4a). The North African populations form a cluster located in the upper-left portion of the plot, except for the Egyptians that lie closest to Palestinians to the lower right. This cluster of North African populations is defined by the predominance of haplogroup E-M81. In contrast, Egyptians are characterized by high frequency of the E-M78 haplogroup and the absence of the E-M81 lineage. Within the North Africa cluster, the Sousse sample is close to the Cosmopolitans and Andalusians Tunisian groups and the general populations from neighboring states, namely Libya, Algeria and Morocco. The Tunisian Berber collections form a tide conglomerate isolated in the upper-left corner of the graph. The sub-Saharan Africans which possess higher frequencies of haplogroups B-M96, A-M91, E-M2 and E*-M96 lie close to each other in the lower left of the plot. The PCA also illustrate the genetic affinity of Levantine populations to Europeans especially Italians. In fact, both groups present relatively high frequencies of J-M172 and share some other lineages in particular I-M170 and T-M70.

Figure 4
figure 4

(a) Principal component analysis of Y-chromosomal haplogroup frequency data from the investigated and reference populations. (b) Multidimensional scaling plot based on Rst genetic distances of the same populations used in the principal component analysis. The acronyms are the same as those used in Supplementary Table S1. A full color version of this figure is available at the Journal of Human Genetics journal online.

The populations were also contrasted using Rst estimates (Supplementary Table S4) computed from the 10 loci Y-STR haplotypes and plotted using multidimensional scaling (MDS) (Figure 4b). The MDS illustrate similar general features as the PCA, including the North Africans cluster close to Near Easterners with the Sousse population partitioning similarly relative to the other groups as well as the Egyptians affinity to the Palestinians. Yet in the MDS the Near Easterners and Tuareg segregate towards sub-Saharan Africans which, like in the PCA graph, aggregate in the lower-left corner of the plot distant from all other populations.

To further explore genetic structure in our samples, several separate analyses of molecular variances were performed using both Y-STR haplotypes and Y-SNP haplogroups data (Supplementary Table S5). A significant genetic heterogeneity was revealed when all populations were considered as a single group. When considering North Africa versus the three geographical regions separately (Near East, Europe and sub-Sahara Africa), the only non-significant genetic differentiation was relative to the Near East.

Discussion

This study characterizes the Y-chromosome DNA diversity of the Sousse population using high-resolution biallelic and STR markers coupled to a wide sampling coverage of reference populations in order to provide a comprehensive understanding of the paternal genetic substructure of the Tunisian people and their origins in the content of the North African genetic landscape. Our results indicate that the frequency of the E-M81 haplogroup in the region of Sousse is relatively lower than in Berber group; it reaches a value of 45%, comparable with the values detected in the Cosmopolitan population from Tunis and Andalusians.21 This frequency range of 36–45% of E-M81 in these Cosmopolitan and Andalusian Tunisian populations is consistent with a strong common Berber back-ground that gives the typical profile to North African populations that aggregate in the same clusters in the PC and MDS analyses. However, depending on the Berber or Cosmopolitan status, North African populations can be classified into subclusters. This observation is in agreement with the analyses of molecular variance analyses that indicate significant differences in molecular variance for both SNP and STR marker systems (17.65 P0.05 and 1.64 P0.05, respectively) between Cosmopolitans and Berbers. In all of these parameters, the Sousse population exhibits characteristics of a highly genetically diverse North African population. Among the various Tunisian communities, haplogroup E-M81 is more prevalent in Berbers,21, 22 especially those inhabiting remote villages where it was found to be fixed in two independent Berber populations (Chenini-Douiret and Jradou).21 Similarly, previous phylogeographic analyses of the E-M81 lineage in other North African Berber-speaking groups has shown its high frequency (80% in the Mozabites of Algeria; 65%–73% in Berbers from Morocco19, 20, 23).

The E-M81 lineage exhibits a star-like network structure (Figure 3a), which suggests an ancient evolution. This network exhibits apparent rapid expansion at some point that may be explained by loss of diversity due to genetic drift. Indeed, most of the STR haplotypes belonging to the E-M81 haplogroup are shared among various North African communities without obvious genetic structure relative to geography. Based on these phylogeographic considerations, the E-M81 haplogroup is likely to represent an important paternal founder lineage of North African people.

In our previous work performed on Tunisian populations, this haplogroup was dated to 7.4±5.5 kya in the Neolithic, comparable with the age estimated in this study (5.7±3.9 kya), and by Arredi and collaborators.19 However, considering the high level of genetic drift typically experienced by uniparental marker systems such as the Y-chromosome, it is possible that the E-M81 haplogroup had a more ancient genesis in North Africa. Indeed, according to Keita,37 E-M81 most likely emerged from the sub-Saharan clade E-M35 either in the Maghreb region, or possibly as far south as the Horn of Africa. This conjecture is consistent with the creation of sub-Saharan mitochondrial DNA markers that date to 20 kya38 and probably contemporary with the Ibero-Maurusian civilization during the Paleolithic.

Along with the high prevalence of E-M81 in Sousse, we observed the E-M78 mutation at low frequency (5%). In contrast to the above-mentioned E-M81 haplogroup, E-M78 has a wide geographic distribution with its highest frequency observed in the Egypto-Palestinian area. It is detected at lower frequencies in Northwest Africa and is particularly observed in Andalusians and Cosmopolitans rather than Berbers. This suggests an east to west gene flow with greater penetration into the Cosmopolitan populations of North Africa.

In addition to the indigenous components, gene flow from the Middle East is also evident in certain Tunisian populations by the presence of the J-M304 haplogroup. In fact, J-M304 is the predominant Y-chromosome lineage in the Middle East. This lineage splits into two main subclades, J-M267 and J-M172. The frequency and diversity of J-M267 is highest in the Middle East. In Tunisia, with the exception of three Berber communities, J-M267 is detected with frequencies ranging from 48% to 5%. In Sousse J-M267 is observed at levels (26%) comparable with that found in Cosmopolitans from Tunis (28%). The network topology of the J-M267 lineage illustrates a star-like topology with no geographic structuring. This feature supports a demic expansion from ancestral haplotypes currently shared by Maghreban and Middle Eastern populations, subsequent to migrations from the east. J-M267 and its subhaplogroup J-P58 are widely distributed in North Africa but display higher frequencies in the east with the exception of some Tunisian regions.

Gene flow from the Middle East should be considered according to different migration waves during prehistory, antiquity and in more recent times. In prehistorical and historical times, Middle Eastern contributions to North African populations could have happened during Neolithic period around 8000 BP as part of the Capsian civilization developments that were introduced in North Africa along with agriculture. The presence of the J-M267 haplogroup with a long local evolution in the Berber of Sened in the nominal Capsian region is congruent with this hypothesis. However, its presence in the Sousse population at a relative high frequency may be indicative of additional secondary regional gene flow events.

Undoubtedly, the Muslim expansion of the seventh century into North Africa and the subsequent massive Bedouin migration during the eleventh century contributed considerably to the east to west gene flow into North Africa. The J-M267 haplogroup with frequencies as high as 40% in Saudi Arabia39 could have been introduced with the Arab migration waves into North Africa during historical times. Its presence in North Africa at frequencies around 30%21, 35, 40 and its absence in Berbers, excluding the Sened (see above), may be explained by the Arab conquest. Hence, it could be considered as an indicator of Arab gene flow into these populations. Furthermore, the relative high frequency of the J-M267 haplogroup in Andalusians from Tunisia may be the result of the strong relationship between Andalusia in southern Spain and Northwest Africa as well as the Middle East during the Omeyad dynasty, and the subsequent expatriation of Muslims from Iberia.

The Sousse population and the other Tunisian groups already studied exhibit a fair amount of differences in terms of their Middle Eastern paternal lineages composition. Sousse, particularly, possesses a relatively high frequency of haplogroup J-M172 (~9%) that is absent in all of the remaining Tunisian populations with the exception of Andalusians (~3%). Haplogroup J-M172 has been associated with population movements in the Fertile Crescent during the Neolithic Agricultural revolution. Today, it is very frequent in the Levant, Anatolia and Iran41, 42 and its recent spread in the Mediterranean is believed to have been facilitated by the maritime trading culture of the Phoenicians (1550–300 BC). According to Zalloua and collaborators43 evidence of Phoenician influence in Tunisian is apparent by the presence of the J-M172 Y-chromosome haplogroup in coastal regions considered as areas of Phoenician contact (versus inland). In Sousse, the J-M172 lineage is exclusively represented by its J-M410 clade, of which the J-L24 mutation is the most prevalent (12 individuals). The remaining seven individuals belong to the following subclades: J-M410 (three individuals), J-Page55 (two individuals) and J-DYS4457 (one individual). In fact, J-L24, the most frequent subclade, constitutes also the most widespread subclade of J2a, with a geographic distribution ranging from the Middle East to Europe, North Africa and South Asia.

The J-M172 haplogroup associated with the Phoenician expansion is distributed throughout the Mediterranean basin and Asia. It is thought that the Phoenicians originated in what is today coastal Lebanon and subsequently founded and settled several city-states in the Mediterranean including in North Africa. In Tunisia, their population number was estimated, at the end of their dominion, to be 100 000 compared with 500 000 Berbers.44 It is important to note that although the most famous city founded by the Phoenicians was Carthage, they also established the settlements of Utique and Sousse.13 Interestingly, Sousse is the only Phoenician town in Tunisia that has been continuously inhabited since its foundation8 and it is the only population where the J-M172 Phoenician paternal marker is detected. In Sousse, J-M172 dates to 6.9±5.8 kya. These results are consistent with the J-M172 introduction into the Sousse population as a diversified haplotype and subsequently maintained locally. This interpretation is congruent with the Network analysis of this haplogroup. The J-M172 haplogroup is also observed in Andalusians. Andalusians are expected to have a direct Middle Eastern contribution in the Iberian Peninsula45 before their expatriation to North Africa after the fall of Granada to Christian forcces in 1492 A.D.

A more recent potential Middle Eastern genetic contribution to the North Africa gene pool may be associated with the expansion of the Ottoman Empire. Sousse also may have been specifically impacted by the Turkish occupation of North Africa. Yet, the unique presence of J-M172 in Sousse and its absence from other Tunisian regions that were under Ottoman influence argues for J-M172 in Sousse as a Phoenician signal. Further, Sousse exhibits another haplogroup, T-M184, that is not detected in any other North African population attesting again to the Phoenicians contribution to that population. Haplogroup T-M184 is more common today in East Africa and it is thought to signal the spread of agriculture from the Fertile Crescent. Indeed, the oldest subclades and the greatest diversity of T-M184 are found in the Middle East, especially within the Fertile Crescent. Yet, T-M184 could also have been dispersed throughout the Mediterranean basin by the Phoenicians (1200–800 before the common era).

In this study, we have also detected other haplogroups in Sousse including R-M207, G-M201, L-M20 and A-M91 that have not been reported in other Tunisian communities attesting to the genetic heterogeneity of this region. Haplogroup R is mainly represented by R-M207 in Sousse (nine individuals) and its distribution includes all continents being fairly common in Europe, South Asia and Central Asia.46, 47 It also occurs in the Caucasus, Middle East and in some parts of sub-Sahara Africa.41, 48 In addition, its two subclades, R1a and R1b, are observed at low frequencies (0.45 and 0.91%, respectively) in Sousse. R1a is most prevalent in East Europe while R1b is most frequent among West Europeans.49

Although at very-low frequency (0.45%), it must be noted that haplogroupe A-M91, the oldest human haplogroup, is found in Sousse but nowhere else in North Africa. This haplogroup is mainly found in Eastern and Southern Africa. It is common among the Khoisan people, including the Bushmen, and it is considered an original ancestral haplogroup.

In conclusion, the analysis of admixed populations represents an unique opportunity to examine the impact of multiple migrations into a region. Within a historical context, both population isolation and admixture have had a considerable impact on the Tunisian population structure. The wide range of paternal lineages present in our Sousse population indicates a diverse origin. Indeed the genetic structure observed of paternal lineages in Sousse is more diversified than any other studied Tunisian population and reflects largely the influence of successive migrations since its foundation by the Phoenicians.