Background
Approximately 70 mosquito species that belong to the genus
Anopheles have the capacity to transmit parasites, such as
Plasmodium species and
Wuchereria bancrofti, agents of malaria and Bancroftian lymphatic filariasis, respectively. Thereby,
Anopheles constitute a major public health concern [
1,
2].
Traditional morphological identification with the use of dichotomous keys is the first step towards
Anopheles vector species identification [
3]. However, it requires technical skills and comprehensive training. It is also difficult for damaged specimens, new species, cryptic species, species with overlapping characteristics and cases of intraspecies morphological variation [
4]. To overcome biased interpretations of species distributions and bionomics, molecular identification has been proposed as a complementary tool [
5]. The most targeted gene for
Anopheles species identification is the rDNA internal transcribed spacer region 2 (rDNA ITS2). However, specific primers are often required for species identification, such as that for the Sundaicus complex [
6]. In addition, multiple gene sequences are often needed for unambiguous identification, especially due to poor availability of molecular reference databases [
3,
7,
8].
Protein profiling using matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) for arthropod identification is a promising tool [
8,
9]. Several teams have built in-house databases to identify species of adults
Anopheles by their MALDI-TOF spectra. Some of them used the legs to minimize the amount of material from specimen vouchers [
10‐
14], whereas some other studies used the cephalothorax [
15,
16]. Consequently, there is no general consensus regarding the optimal anatomic part to be used. The protein repertoire from arthropods has been shown to vary according to compartment [
8,
17]. There is a need to establish a standardized and optimized protocol determining which body part produces the most reproducible and specific mass spectra protein profile [
8,
9]. In addition, it is important to evaluate the influence of geographic variability on identification results, as it may lead to protein variability [
10,
16].
The aim of this study was to determine the anatomic part of Anopheles adult mosquitoes, both males and females, best suited for the identification of field specimens. A mass spectral library (MSL) was generated using different mosquito body parts, for both males and females, obtained from reference centres. The MSL was evaluated using two independent panels of field-collected specimens from Mali and Guinea. Geographic variability was tested using several databases containing additional specimens from Mali and Senegal.
Discussion
This study provides new insight into the use of MALDI-TOF MS for Anopheles species identification. It tested the best-suited body part and the impact of the geographic origin of the specimens using two independent panels from different mosquito populations and four databases.
Differences of mass spectrum protein profiles and reproducibility levels were observed between body parts of
Anopheles species from MSL and panels. The spectra from the legs exhibited the smaller number of peaks of high intensity, showing that the protein content was less diverse than for the head and thorax. Previous studies concluded that legs provided sufficient protein material to give reproducible and specific mass spectra [
10‐
14]. However, a recent study reported that the using of less than four legs could compromise the MALDI-TOF MS identification of mosquito species, showing that at least four legs are required to get sufficient protein material [
25]. In addition, one of the previous studies observed that the quality of legs spectra from field-caught
Anopheles was lower than that from colony specimens, with a decreased intensity [
14]. This suggests a possible protein degradation of the legs from field-caught specimens. As previously mentioned [
16], the fragility of the legs, which are breakable and can be lost during collection, transportation, storage or processing, may lead to partial or total loss of the protein content. Indeed, a study showed that legs were prone to degradation during the trapping, with modification of protein profiles and a decrease of identification log(scores) as the trapping duration increased, even after 24 h of trapping [
25]. Similarly, disparities have been observed in this study, between spectra from laboratory-reared specimens and from field-caught specimens for every anatomic parts. Field-caught specimens showed lower reproducibility levels. In addition, the duration of storage also seems to have impacted the mass spectra reproducibility levels. Indeed, the spectra of colony specimens of
An. arabiensis obtained after the shortest storage duration (3 weeks at − 20 °C) had high reproducibility levels for every anatomic parts, contrary to the other colony specimens of the MSL. The head provided the highest reproducibility of mass spectra, no matter the origin of the specimens (colony or field) and no matter the conditions of storage, compared to the legs and thorax. This was consistent with the presence of higher identification log(scores) using the head. Therefore, the head protein content could be less prone to degradation and more robust than the other body parts.
This study has revealed that
Anopheles thorax spectra from engorged field-caught specimens dissected after frozen storage were negatively impacted by the blood meal, contrary to the head and the legs ones. Two previous studies [
15,
16] used the cephalothorax, as it gave a stronger mass spectrometry signal than the legs and provided the minimum concentration of 0.2 mg/mL raw protein recommended by Steinmann et al. [
26]. However, the majority of specimens included were laboratory-raised from larvae field-collected and were non-engorged. One of the two studies also included resting females caught by aspiration and potentially blood-fed [
15]. The number of peaks from specimens caught by aspiration was lower than that from specimen’s laboratory-raised and sometimes no peaks were observed. Thus, they postulated that the abdomen blood content somehow negatively influenced the frozen preservation of the engorged specimens. Similarly, for MALDI-TOF identification of sand fly species, the thorax of engorged specimens led to blood contaminations during the separation from the abdomen, after frozen storage [
27‐
29]. Here, visually engorged
Anopheles displayed specific patterns in thorax protein profiles and mass spectra reproducibility level of field-caught specimens was lower to that of laboratory-reared ones. These protein patterns probably correspond to haemoglobin signal, modified after blood digestion process and /or frozen storage. To precisely identify the proteins, this would require the using of other proteomic tools such as LC/MS. In contrast, Vega Rua et al
. [
30] observed highly reproducible thorax spectra of
Aedes sp. and
Culex sp, both laboratory-reared and field-caught, using frozen storage at − 20 °C from a few months to one year. However, the authors included only non-engorged female mosquitoes. Other field parameters can impact the
Anopheles protein content and led to heterogeneity of mass spectra between laboratory-reared and field-caught specimens. For instance, seasonal fluctuations in temperature can also modify the phenotype. In a field population of
Anopheles merus captured in South Africa, the mean wing length decreased by 19.6% in summer [
31]. This illustrates the benefit of adding a high diversity of field-caught
Anopheles in validation panels and spectral databases.
Using the initial MSL that did not contain specimens of the same origin and storage conditions as the panels, the proportion of interpretable (LSV ≥ 1.7) and correct identifications was significantly higher using the head than using the thorax or the legs. However, this proportion of correct identifications remained low (64%) during the query of panels (A+B). Using the head, the proportion of specimens with an LSV < 1.7 was of 16% whereas greater proportions were observed for the thorax (58%) and the legs (33%). In contrast, using database 2, containing specimens of the same origin and storage conditions as the panel A, the results were significantly improved. Indeed, the legs provided high proportions of correct identifications, comparable to the head (96% and 98%, respectively), which was in agreement with the previous studies using the legs that also included specimens of the same origin as the panels into the databases [
10‐
14]. Using database 4 that contained additional specimens from Senegal, the difference between the proportions of correct identifications using legs and head was higher (96% and 92%, respectively) but remained not significant. Nevertheless, by comparing the results according to the number of deposits of protein extracts, significant differences between legs and head have been shown. The head provided better performances compared to the legs as it did not require the deposit of multiple spots to optimize the log(score) results. Indeed, with consideration of the highest scoring spectrum, the legs required the using of four replicates of protein extract whereas for the head, only one replicate provided almost equivalent results as using four replicates, at the LSV threshold of 1.7 and 1.8. These results are concordant with a recent study on mosquitoes [
25] that observed that using the legs, the LSVs were improved when three spots of each sample were deposited onto the target plate, compared to the using of only one spot. Therefore, the using of the head will represent a concrete improvement of the routine use of MALDI-TOF for
Anopheles identification, as it will allow to gain rapidity of analysis by decreasing the number of deposits. For the thorax, the proportion of correct identifications significantly increased (81%) using panel A against database 2, but remained lower than for the other body parts. A database associating thorax, head and legs spectra may improve the identification results, especially when only legs are used for database queries. Indeed, 46.7% of the leg spectra of panel A had cross-matching with the head or thorax spectra of the database 1. The same observation was made using panel A against database 2 (41.3% of cross-matching using the legs). This result was consistent with Vega Rua et al
. [
30], who recommended a double database creation with thorax and legs to improve the identification of specimens with missing or damaged legs. For database querying, they also recommended the use of both the thorax and legs for double checking of mosquito species identification. Here, for database querying using a database without specimens of the same origin as the tested ones, superiority was observed using the head alone instead of using “thorax + legs” or other associations of body parts.
The potential of including
Anopheles specimens from the geographic area to be investigated has been confirmed. This reflects a great heterogeneity of mass spectra protein profiles between the
Anopheles specimens of the initial MSL and specimens of the panels. As all the previous studies have included in the databases specimens of the same origin as the specimens to be tested, they did not reveal as much the importance of this methodology. However, heterogeneity of mass spectra was also reported when comparing mosquito species [
10,
11,
16] or sand fly species [
22,
32,
33] from various geographical origins and between reared and field mosquito spectra [
10,
15]. For a same species, the observation of biomarkers specific to colony specimens and to field specimens [
10,
15] suggests a great variability in protein content due to phenotypic distinctness in relation to the genetic diversity of
Anopheles, influenced by environmental settings, evolutionary history adaptation, demographic history or genetic drift. However, clustering analyses indicates that the experimental conditions seem to also have a great impact on mass spectrum protein profiles. In this study, the mass spectra protein profiles of
An. gambiae from Kenya, Mali and Guinea have been compared and the spectra were not exclusively clustered according to the geographical origin in a dendrogram. Similarly, a study reported that specimens both from the same
Anopheles species and colony were split in different groups of a dendrogram [
15]. Therefore, it is supposed that variability of mass spectra can also result from the method of storage or other experimental conditions such as trapping method or trapping duration [
25], quality of protein extraction and homogenization [
34]. In addition, even if we did not know precisely the age of the colonies, as there was no clear clustering of the spectra from colony specimens, it probably not has impacted the results. These variations between findings may pose a challenge in practical use of MALDI TOF MS for mosquitoes’ identification and may complicate the creation of large international databases, in contrast to bacteria or fungi. Region-specific mass spectrum databases will have to be produced. Moreover, important efforts of standardization will be necessary, such as the using of internal biomarkers, as previously suggested [
10,
15].
Most identification errors consisted of mismatches between the cryptic species
An. gambiae and
An. arabiensis, which are well described in the Gambiae complex [
10,
11,
15]. The identification of the cryptic species seems to be even more susceptible to the experimental conditions and database species composition. Indeed, the addition in the database of specimens of the same origin as that of the panel significantly decreased the mismatch between
An. arabiensis and
An. gambiae. It is not surprising, as a previous study observed only four identical biomarkers between laboratory-reared and field-caught
An. arabiensis specimens [
10]. However, the addition of close species into the databases, such as specimens of
An. arabiensis field-caught from Senegal in the Databases 3 and 4, increased identification errors using the head and the legs, but not significantly. Between field-caught
An. arabiensis and
An. gambiae, identification mismatches have been reported using the legs, even with an LSV > 2 [
10]. The authors have reported 19 identical peaks masses between field-caught
An. gambiae and
An. arabiensis for the spectra of legs, explaining the mismatches. They pointed out the limitations of usual bio-informatic tools in distinguishing clearly between cryptic species. Similarly, another study has shown that the cryptic species of the Gambiae complex, including
An. arabiensis and
An. gambiae did not segregate into well-defined clusters in a dendrogram [
15]. Using the cephalothorax, the presence of biomarkers specific to each species of the Gambiae complex allowed classification of mass spectra using machine learning methods, opening the door to new approaches.
A limitation of the study is that some results may have been affected by the use of various storage methods and the duration of storage. Indeed, some specimens were dry frozen preserved and analysed several months or years later, whereas other specimens were stored at ambient temperature and analysed in a few weeks. However, as these various storage conditions have been shown to preserve the quality of spectra, the results were most likely only partially affected [
8,
25]. Another limitation is that only one field-caught species was tested in the panels, which was the dominant species
An. gambiae. Further studies using larger databases and panels exhibiting more species diversity are required, especially to improve the resolution of MALDI-TOF MS for closely related species. MALDI-TOF MS should be a good alternative to molecular methods for eco-epidemiological studies of
Anopheles vectors when taxonomic resolution is adequate. The technique does not require much training, in contrast to the morphological identification of
Anopheles. In addition, MALDI-TOF MS analyses of one hundred specimens can be assessed in a few hours, whereas molecular methods require several steps of analysis, from DNA extraction to sequence editing and assignment. Once the MALDI-TOF MS instrument is acquired, which is expensive and therefore a major investment ($200,000 for a complete system), this method requires inexpensive consumables, and the cost is estimated at $1–2 per sample. It may be useful in areas where entomological experts may not be available, for damaged specimens and to distinguish cryptic species. Similar to DNA sequence databases, large use of MALDI-TOF MS databases requires accessibility through online applications, as previously remarked [
8,
9,
22,
32]. Such online platforms have already been proposed for fungi [
23] and
Leishmania species [
35]. Therefore, we plan to share an MSL dedicated to
Anopheles species identification via an online platform that is currently being set up, following a suggestion by Schaffner et al
. for mosquito surveillance [
36].