Introduction

Thalassemia is considered one of the most common genetic disorders in the world, with a high frequency in tropical and sub-tropical areas such as Mediterranean countries, the Indian subcontinent, the Middle East, North African and Southeast Asia1, 2. Thalassemia is classified into two major types, namely, α- and β-thalassemia, according to defects in these globin genes3. Mutations or deletions of globin genes cause abnormal haemoglobin formation, resulting in asymptomatic to severe anaemia. A foetus with α-thalassemia major will die in utero or shortly after birth, which compromises the health of the mother. β-thalassemia major patients experience severe anaemia and serious complications that include liver damage, cardiac disease4 and endocrine dysfunction5; these patients will die before 5 years of age if not treated with regular transfusions, iron chelation therapy or haematopoietic stem cell transplantation6. Overall, thalassemia major patients impose a considerable burden on their families and health authorities.

Presently, carrier screening, molecular diagnostics, genetic counselling, and prenatal diagnosis are employed to prevent the occurrence of thalassemia major, and these prevention programmes have had great success, leading to a decline in the birth rate of thalassemia major in some countries7. Surveys on thalassemia in mainland China began in the 1980s, and an understanding of the epidemiological characteristics of the disease provides important information for prevention. Previous literature indicates a high frequency of thalassemia in the population of southern China, mainly south of the Yangtze River, particularly in the provinces of Yunnan, Guangdong, Guangxi, Fujian and Sichuan. The Zeng study8, which was performed in many laboratories in 1987, calculated the nationwide prevalence of α-thalassemia and β-thalassemia to be 2.64% and 0.66%, respectively.

In recent years, large-scale surveys for thalassemia have been conducted in different parts of China, and the prevalence is still high9, 10. However, due to a lack of a comprehensive system for data collection and analysis, there are no data on the precise frequency and distribution patterns, overall burden, and trends of α- and β-thalassemia at a national scale. Furthermore, with economic improvement and population migration, thalassemia is spreading to parts of China that are north of the Yangtze River11. The current epidemiological characteristics of thalassemia are not completely understood. Therefore, we conducted a systematic review of recent evidence from regional population-based surveys on thalassemia to obtain a comprehensive picture of the disease in mainland China (excluding Hong Kong, Taiwan and Macao). The purpose of this study was to summarize the overall reappraisal of the prevalence of thalassemia in mainland China and to explore the epidemiological characteristics of thalassemia. The results provide a comprehensive view of thalassemia’s prevalence in China and may contribute to its control and management where it is prevalent.

Results

Study identification

Our preliminary literature search identified a total of 1,534 potentially relevant studies (34 on Pubmed, 15 on Embase, 165 on CBM Database, 1173 on CNKI, 108 on WanFang database, 28 on Chongqing VIP database, and the rest 11 on reference lists). After screening the titles and abstracts of these studies, we excluded 1,380 that were obvious irrelevant (n = 1274) or duplicated in the databases (n = 106). The remaining 154 studies were retrieved for a full-text assessment, and as a result, 138 were excluded because they (i) did not provide data for the prevalence calculation (n = 25), (ii) were based on special populations (n = 18) or special areas (n = 32), (iii) were duplicates (n = 16) or studies that were contained in another study (n = 12), or (iv) were reviews (n = 35). Ultimately, 16 studies were identified to be suitable for this meta-analysis6, 8, 12,13,14,15,16,17,18,19,20,21,22,23,24,25 (Fig. 1).

Figure 1
figure 1

Flow chart of the selection process for the included studies.

Characteristics and quality assessment of the included studies

The general individual characteristics of the included 16 studies are summarized in Table 1. The survey dates ranged from 1981 to 2012, with an age range from 0 to 64 years. These studies covered 12 provinces (Fujian, Guangdong, Yunnan, Sichuan, Guizhou, Zhejiang, Jiangsu, Hunan, Hubei, Jiangxi, Liaoning, and Xinjiang), 1 municipality (Chongqing) and 1 autonomous region (Guangxi) of mainland China. The prevalence of α-thalassemia, β-thalassemia and α + β-thalassemia ranged from 1.20~19.87%, 0.53~6.84% and 0.08~1.22%, respectively. More details are presented in Table 1 and Supplementary Table 1 (Table S1).

Table 1 The characteristics of the included studies.

In Table S2, we show the detailed score of the 16 included studies. A total of five items with a maximum of 10 scores was used to analyse the quality of the identified studies. The results indicated that all the included studies were eligible; one study obtained full marks, one study obtained a score of 7, 10 studies scored an 8 and the remaining 4 studies scored a 9.

Epidemiology of thalassemia

Overall prevalence of α-thalassemia

A total of 12 studies identified 7,696 α-thalassemia cases in 84,598 subjects, with an overall prevalence of 7.88% (95%CI: 5.54~10.23) (Fig. 2, Table 2). The prevalence of α-thalassemia fluctuated annually, first declining sharply and then increasing, and this cycle continued (Fig. 3). The highest prevalence was found in Guangxi (14.13%, 95%CI: 11.12~17.13), and the second highest in Guangdong (9.46%, 95%CI: 4.00~14.92); the lowest prevalence was found in Shanghai (0.25%, 95%CI: 0.00~0.50). We analysed differences in geographical distribution across mainland China using a colour-coded map divided into four sections with different colours to indicate the highest to lowest prevalence. As shown in the map, the majority of the mainland China regions in which the epidemiologic survey was conducted were located in the south. More details are shown in Fig. 4.

Figure 2
figure 2

The pooled prevalence of thalassemia in mainland China.

Table 2 Summary of the prevalence estimation for thalassemia.
Figure 3
figure 3

Analysis of the thalassemia prevalence by year.

Figure 4
figure 4

Distribution of the prevalence of α-thalassemia in different regions of mainland China.

Gene frequencies of α-thalassemia subtypes

The subtypes of α-thalassemia that we analysed in this study were --SEA, -α3.7, -α4.2, αCSα, αWSα and αQSα, and their gene frequency was 2.54% (95%CI: 1.57~3.51), 1.59% (95%CI: 0.93~2.24), 0.54% (95%CI: 0.31~0.78), 0.24% (95%CI: 0.16~0.32), 0.26% (95%CI: 0.17~0.36) and 0.06% (95%CI: 0.02~0.10), respectively (Fig. 5, Table 2).

Figure 5
figure 5

Gene frequencies of α-thalassemia subtypes.

Overall prevalence of β-thalassemia

The pooled prevalence of β-thalassemia (2.21%, 95%CI: 1.93~2.48) (Fig. 2, Table 2) was estimated on the basis of 13 related studies. Fluctuations in the prevalence were observed when we assessed the prevalence of β-thalassemia by year (Fig. 3). The highest prevalence was in Guangxi (4.91%, 95%CI: 2.67~7.15) and the lowest in Xinjiang (0.02%, 95%CI: 0.01~0.03). As shown in the colour-coded map in Fig. 6, prior epidemiological surveys of thalassemia had not been conducted in most zones.

Figure 6
figure 6

Distribution of the prevalence of β-thalassemia in different regions of mainland China.

Gene frequencies of β-thalassemia subtypes

The estimated frequencies of the six subtypes of β-thalassemia were 0.93% (95%CI: 0.54~1.32) for CD41/42, 0.25% (95%CI: 0.11~0.40) for IVS-2-654, 0.07% (95%CI: 0.03~0.12) for CD71/72, 0.07% (95%CI: 0.03~0.10) for CD26, 0.25% (95%CI: 0.11~0.38) for -28 and 0.48% (95%CI: 0.30~0.65) for CD17 (Fig. 7, Table 2).

Figure 7
figure 7

Gene frequencies of β-thalassemia subtypes.

Overall prevalence of α + β-thalassemia

The prevalence of α + β-thalassemia of 0.48% (Fig. 2, Table 2) was based on 6 available studies, though this prevalence exhibited large fluctuations over time (Fig. 3).

Meta-regression

A meta-regression was performed to explore the sources of heterogeneity. In the present study, we considered several potential factors, including total sample size, quality score, diagnostic method, age range, survey date, location, and sampling method. Ultimately, none of these factors was identified as a source of heterogeneity for α-thalassemia (all p > 0.05). However, diagnostic method (p < 0.001) and survey date (NA, p = 0.02) were identified as a source of heterogeneity for β-thalassemia. Tables S3 and S4 present the results of this meta-regression.

Sensitivity analysis

The pooled prevalence of α-thalassemia changed significantly when six single studies (Xu et al.17, Yao et al.16, Yin et al.14, Xiong et al.18, Pan et al.6, and Zeng et al.8) were sequentially omitted. In addition, removing the study by Zeng et al.8 contributed to the marked change in the pooled prevalence of β-thalassemia. Similarly, removing each of the studies by Xu et al.17 and Yin et al.14 altered the overall prevalence of α + β-thalassemia. More information is listed in Table S5.

Publication bias

All three types of thalassemia presented the asymmetric shape of the funnel plots. Egger’s tests showed a significant value in α-thalassemia (p = 0.049), β-thalassemia (p < 0.001) but α + β-thalassemia (p = 0.27) (Table S6).

Discussion

To the best of our knowledge, this study is the first meta-analysis of epidemiological studies on the prevalence of thalassemia in mainland China. Our results indicated that the pooled prevalence of α-, β- and α + β-thalassemia was 7.88%, 2.21% and 0.48%, respectively. The geographic distribution of thalassemia showed that the prevalence was highest in the south of China and decreased from south to north.

Thalassemia is a genetic disease for which it is possible to detect carriers using haematological indices rather than DNA analysis26. Increased HbA2 levels in peripheral venous blood is the most important feature for identifying heterozygous β-thalassemia27. For β-thalassemia screening, people with increased Hb A2 levels (HbA2 > 3.5%) are diagnosed with β-thalassemia7. Therefore, three studies regarding β-thalassemia (Zeng et al. in 19878, Ma et al.20 and Liu et al.21) screening without gene analysis were included in our meta-analysis. In determining the prevalence of α-thalassemia, cord blood samples were quantified for Hb Bart’s when genetic analysis was not widely applied to α-thalassemia diagnosis. Pan et al.6 employed haemoglobin electrophoresis for cord blood and gene analysis simultaneously and found that cases of α-thalassemia, including heterozygous α-thalassemia, were unlikely to be missed when using a 2% cut-off of Hb Bart’s. Zeng et al.8 in which 12,821 samples of cord blood from new-borns were screened by electrophoresis, was the only study included on α-thalassemia that lacked gene analysis. Although the limitations of the experiment technology at the time prevented an assignment of thalassemia mutation subtypes, the study provides very reliable data on the nationwide incidence. Although gene analysis began to be widely used in the 2000s to ascertain thalassemia mutations in individuals, haematological indices still play a key role in diagnosis.

The 643,580 research subjects included in the 16 studies examined can generally be divided into neonates, children and adults. Although the age ranges of the different studies varied widely, subgroup analysis based on age was not performed because thalassemia is an inherited disease, and the carrying rates of different age groups are consistent in the same area. Five included studies6, 8, 12, 23, 24 in which the reported cases were only neonates attempted to determine the prevalence of α-thalassemia. The neonates were randomly selected such that the rates of thalassemia were representative and could be compared with the results for children or adults.

Surveys on thalassemia in China began in the 1980s. In 1987, Zeng calculated that the nationwide incidence of α-thalassemia and β-thalassemia was 2.64% and 0.66%, respectively8. Compared to the findings of Zeng, our meta-analysis revealed a higher prevalence of α-thalassemia (7.88%) and β-thalassemia (2.21%) in China. One explanation for the large variation in the reported prevalence is that in the previous study, a low number of samples were collected for thalassemia screening in provinces with a high incidence, such as Guangxi, where only 350 samples were collected, which made the total incidence lower. Overlooking some carriers of silent α-thalassemia mutations, such as αWSα/αα and αCSα/αα18, 28, 29, may be another reason. In addition, the prevalence of α + β-thalassemiain mainland China was found to be 0.48%, and our study confirmed that the double heterozygosity of α- and β-thalassemia is not rare in areas where thalassemia is common. Because there are no significant haematological differences between these double heterozygotes and β-thalassemia, it is noted to do α-thalassemia gene analysis for β-thalassemia carriers.

In our study, we generated GIS maps to provide information for 14 provincial regions and illustrate trends in the geographic distribution of thalassemia. Although the majority of the regions in which epidemiologic surveys in mainland China have been conducted are located in the south, large areas have no epidemiological survey data for thalassemia and most are located in the north. Moreover, some provinces in southern China, such as Guizhou and Hunan, have a high prevalence of β-thalassemia but scant data of α-thalassemia. With industrialization over the past 20 years and the availability of jobs in the developed areas of mainland China, many people from the southwest region have migrated to cities in the north. Indeed, population mobility and migration have resulted in a significantly increasing thalassemia prevalence on other continents, such as in Europe and North America30, 31. However, due to the lack of regional data in provinces in northern China with large population mobility, changes in epidemiological characteristics of thalassemia in those provinces remain unclear. Therefore, we suggest high-quality surveys should be conducted in those areas that lack data for thalassemia prevention.

Based on the present meta-analysis, the most common α-thalassemia mutationin mainland China is --SEA. The high gene frequency of --SEA indicates that the health burden resulting from Hb H diseases and Hb Bart’s hydrops fetalis may be serious in mainland China. In addition, non-deletional α-thalassemia is not rare. αWSα, which is rather rare in other parts of the world32, is the most prevalent non-deletion type of α-thalassemia, with a gene frequency of 0.26%. Several studies on different populations have suggested that the non-deletion types of Hb H disease(--/αTα) are usually more severe than the deletion types (--/-α), with greater anaemia, jaundice, splenomegaly and early anaemic symptoms, and a higher proportion of patients who require blood transfusion and splenectomy28, 33. Therefore, the non-deletion types of α-thalassemia should be included in thalassemia prenatal diagnosis.

To explore the sources of heterogeneity, meta-regression was performed, and the diagnostic method (p < 0.001) and survey date (p = 0.02) were identified as potential sources. To mitigate heterogeneity, subgroup analysis of diagnostic methods for β-thalassemia was performed. However, heterogeneity was still high within subgroup based on diagnostic methods (Table 2). Nonetheless, it has been reported that heterogeneity cannot be avoided in a meta-analysis34, especially in those which based on epidemiological surveys35.

Our findings showed that the results were unstable when removing some individual studies one at a time. When reviewing these studies in detail, we found that the total sample size may play a role in changing the consistency of the results. A large or small sample size with a relatively higher or lower prevalence more easily caused alterations of the results. Given the limited data, we could not identify other factors to verify the robustness of the presented results.

Publication bias existed in our study, even though we comprehensively and systematically searched related studies. However, only studies published in Chinese and/or English were used, which may be a potential factor for publication bias. Insufficient data in the included studies may have also affected the results.

Several other limitations of this study should also be considered. First, because epidemiological studies on thalassemia have only been conducted in 14 provinces in China, we did not obtain good epidemiological or demographic data from all provinces. Second, the epidemiological data available mainly focuses on southern China, particularly in regions of minority nationalities, which could impact the results of our study. Third, the strategies and methodology may also have influenced the estimation of the prevalence of thalassemia. Indeed, some carriers of silent α-thalassemia mutations may not be identified by haematological indices without gene analysis. Because of these limitations, caution should be exercised in interpreting the results and in prescribing direct policy recommendations based on this meta-analysis alone. Nevertheless, our meta-analysis covered and combined most of the available epidemiologic data to generate a reasonably precise estimate of the prevalence of thalassemia.

We conducted the first meta-analysis of the prevalence of thalassemia in mainland China from 1981 to 2015, revealing the epidemiological characteristics of thalassemia. These results show that the overall prevalence of the disease is still high. Individuals in southern China have a higher risk of getting a severe form of thalassemia than those in other regions of China. In the future, epidemic research in northern China and comprehensive measures for epidemic prevention and control in southern China are needed to combat the heavy burden of thalassemia in China.

Methods

Study identification

We performed this meta-analysis on the basis of a systematic and comprehensive search of research on the prevalence of thalassemia in mainland China. Six electronic databases, including the Chinese National Knowledge Infrastructure database (CNKI), the WanFang database, the Chongqing VIP database, the Chinese Biological Medical Literature database (CBM), PubMed, and EMbase, were used for the identification of related studies from their establishment to January 1, 2016. The following key words were used when searching the Chinese databases: ‘thalassemia’; ‘prevalence’; and ‘epidemiology’. In addition, the key words ‘China’ and ‘meta-analysis’ were used in the English databases. We also retrieved the reference lists so that we did not overlook a related study.

Selection criteria

We used the studies for this meta-analysis that met the following criteria:

  1. (i)

    Studies were cross-sectional and conducted in mainland China (not including Hong Kong, Macao, and Taiwan);

  2. (ii)

    Studies stated the prevalence of thalassemia or the related available data (the number of the participants and the number of the thalassemia patients) to calculate the prevalence of thalassemia;

  3. (iii)

    Studies were published in Chinese and/or English; and

  4. (iv)

    Studies were based on epidemiological surveys in general populations.

We excluded the studies that met any of the following criteria:

  1. (i)

    Studies did not provide the relevant data for the prevalence of thalassemia;

  2. (ii)

    Studies were based on the special populations (e.g., the elderly, women, or workers) or special areas (e.g., schools, factories, or earthquake areas); and

  3. (iii)

    Duplicate studies or studies that were contained within another study.

The selection of the studies was performed by two authors independently. When the authors disagreed and could not reach an agreement after discussion, a third author was involved to reach a consensus.

Data extraction and quality assessment of the included studies

After reaching a consensus on the included studies, the data were extracted and entered into an Excel spreadsheet, including the author, publication year, survey date, age range, location, sampling method, diagnostic method, total sample size, the number of individuals in each gender (males and females), the number of patients (including α-, β- and α + β-thalassemia), and the number and gene frequency of the subtypes (--SEA, -α3.7, -α4.2, αCSα, αWSα and αQSα of α-thalassemia; CD41/42, IVS-2-654, CD71/72, CD26, -28 and CD17 of β-thalassemia). The quality of the included studies was assessed by the 5 items and listed in the study by Li et al.36 according to the “Strengthening the Reporting of Observational Studies in Epidemiology” (STROBE) guidelines37. Each item was divided into 3 different levels with different scores (high risk or unclear = 0, moderate risk = 1, and low risk = 2).

Two authors finished the work independently and discussed the issues when disagreements occurred. If these authors could not reach a consensus, another author assisted in making the final decision.

Statistical analysis

The present meta-analysis was conducted using Stata version 12.0 (Stata Corporation, College Station, TX, USA). The DeSimonian and Laird method was used to estimate prevalence, 95% confidence intervals (95%CI), and the proportion of α- or β-thalassemia subtypes. Prevalence is expressed as a percentage; if the number of the thalassemia patients was 0, we assigned a value of ‘0.01’ to retain all useful data when conducting calculations. Additionally, for the study conducted in multiple regions of mainland China, the data for each region were extracted independently for later analysis. For example, Zeng et al. conducted a multi-region study in mainland China, and we extracted the available data in corresponding single regions. ESRI ArcGIS 10.0 version for desktop (http://www.esri.com/software/arcgis/arcgis-for-desktop) was used to assess differences in geographic distribution. Heterogeneity was analysed using Cochran’s x 2-based Q test and I2 statistics (which ranged from 0 to 100%). Heterogeneity was considered to be moderate or high at p < 0.1 or I2 50%, and a random-effects model (the DeSimonian and Laird method) was selected for the meta-analysis. Otherwise, a Mantel-Haenszel fixed-effects model was used. Meta-regression was conducted to analyse the sources of heterogeneity. Sensitivity analysis was performed to assess the effects of single study on the consistency of the results after excluding the included studies sequentially. To evaluate publication bias, funnel plots and an Egger’s test were used.