Background
While significant progress has been made to halt and reverse tuberculosis (TB) cases and deaths globally, the burden of TB remains enormous, with the World Health Organization (WHO) reporting an estimated 10 million incident cases every year [
1]. Huge challenges still remain in the fight against TB, particularly in the Low-Medium-income countries (LMIC) [
1,
2]. With TB incidence rates of over 781/100000, and 60% of incident TB cases co-infected with HIV-infected, South Africa remains one of the world’s top six high TB and HIV burdened countries [
1]. Molecular epidemiological studies have reported that much of the burden of TB disease in South Africa is due to ongoing transmission [
3,
4]. Traditional TB molecular epidemiology studies have sought to distinguish between disease due to recent
Mycobacterium tuberculosis (
Mtb) infection or transmission compared to reactivation of latent infection [
5‐
7]. TB cases with identical strains clustered for a given time and place are often considered to be part of a common transmission chain [
3,
8]. Thus, clustering is often used as a proxy for recent transmission [
2,
9,
10]. Studies from various settings have reported varying findings on risk factors for clustering such as age, immigrant status, HIV infection homelessness, alcoholism, intravenous drug use, social mixing and treatment failure [
11‐
15]. There are discrepancies in the importance of these factors across studies, particularly between the high [
16] and low income country contexts [
17,
18]. There remains a need to further explore and understand the factors driving
Mtb transmission in poor socio-economic communities with a high burden of both TB and HIV. The identification of such risk factors could inform targeted control measures and interventions aimed at interrupting TB disease transmission chains and reducing TB incidence, in line with the WHO’s End TB Strategy [
19]. In this study, we aimed to investigate how social, economic and composite factors related to community TB transmission (clustering vs. non-clustering) in a high TB and HIV burdened community setting.
Methods
We conducted a post hoc analysis of data from a cross-sectional study among TB cases resident in a peri-urban township in Cape Town, South Africa from 2006 to 2010. This community had a population of 13,180 people in 2006 which grew to 16,851 in 2010. Approximately 1 in every 4 adults in this community was HIV-infected as of 2008 [
3,
20]. In the same year TB case notifications were as high as 2000/100000, despite the presence of a functional primary care TB facility and increasing antiretroviral therapy (ART) coverage [
21]. High rates of TB transmission have previously being reported in this community [
22].
Eligible TB clients attending the community TB clinic were identified and informed about the study. Inclusion criteria were TB disease notified from 2006 to end 2010, residency in the study community, and a willingness to provide written informed consent. Clinical and demographic data were extracted from the TB registers and clinical folders. TB and socio-economic data were collected using interviewer-administered questionnaires that were translated to the participant’s local language. The questionnaires captured data on TB history, TB contacts, sexual history, and socio-economic such as occupation, income level, educational level and living conditions.
HIV testing and counseling (and referral for treatment, where required) was conducted according to the national HIV guidelines [
23]. Sputum specimens were obtained from TB suspects in accordance with the national TB testing, diagnostic and treatment guidelines [
24]. Mycobacteriological tests, including microscopy and culture, were performed on the sputum specimens as described elsewhere [
25].
Mtb isolates from participants were analysed using
IS6110-based Restriction Fragment Length Polymorphism (RFLP), [
26] performed at the Public Health Research Institute (PHRI), Tuberculosis Centre Laboratory, New Jersey. Based on the genotyping data, strains were classified using standard software and tools [
27]. Previous analysis of the
Mtb strains showed that the dominant strain families in the study population were the W-Beijing (29% of participants) and CC-related strains (24%) [
28].
Definitions
A strain was defined as a genetic variant of an isolate [
29]. A unique strain was an isolate with an RFLP pattern that occurred in only one participant within the study dataset and was designated as a non-clustered strain. A cluster was defined as > 1 specific strain detected in different individuals within the study population. Strains from dually infected participants were analyzed as individual samples (
n = 2). Retreatment TB cases resulting from the same strain as the patient’s previous TB episode were presumed to be due to relapse and were excluded from analysis. Strains with < 6 copies of IS6110 (low bandwidth strains) are known to be poorly differentiated by the RFLP technique and so were excluded from further analysis [
29].
Composite scores were generated for economic and social risk factors. Variables for inclusion in the composite scores were decided prior to analysis but finalized based on assessment for collinearity. Education level, employment status, income level, electricity access, having a toilet in the house, and number of rooms used for sleeping (a surrogate for house size) were all classified as economic factors and comprised the composite economic score out of 11. The type of house was strongly correlated with electricity supply to the house (variance inflation factor [VIF]: 9.8) and was therefore not included in the composite score. Each variable was assigned a value ranging from 0 to 4 (depending on the number of categories in the variable), with a higher score corresponding to higher economic status. For example, education was scored 0 for no formal education and 4 if a participant had tertiary education; a score of 0 was given if there was no electricity in the participant’s house and 1 if the house had electricity. The following factors were incorporated in the social score with a maximum score of 9: alcohol consumption in past 12 months, shebeen (informal tavern) patronage in past 12 months, meeting regularly with a group, regular use of a minibus taxi, number of new sexual partners within the past 6 months, number of houses on the residential plot and number of occupants living in the same house. It is also notable that while the majority of those participants who reported visiting shebeens also consumed alcohol, there was a proportion that visit shebeens for social or other reasons besides alcohol consumption. Furthermore, not all alcohol consumption occurs on shebeen premises. Given the weak collinearity between alcohol drinking and shebeen patronage (VIF: 2.2) we chose to keep both these variables in the social score. Each variable was assigned a value of 0, 1 or 2 (depending on the number of categories in the variable), with a higher score corresponding to greater social interaction. Both the economic and social scores were divided into binary variables at the median (to generate a “low” and “high” economic and social score).
Additional relevant risk factors not classifiable as social or economic risk factors included in the analyses were: a history of TB contacts, recent death in family, tobacco smoking, period of residence in the same house and in the community, history of mine work, history of imprisonment and time spent outside study community.
Our analysis was restricted to adult participants (≥15 years of age) who had both socio-economic questionnaire data and an RFLP-based Mtb genotype available. We excluded children (n = 12) on the presumption that social and economic behaviors of children were different from those of adults.
Statistical analysis
Data were analysed using Stata 15.0 (StataCorp, College Station, Texas). Bivariable analyses were performed using chi-squared and Wilcoxon signed rank tests to explore baseline differences in the socio-economic and traditional TB risk factors between the clustered and the non-clustered participants, as appropriate. Univariable logistic regression models were used to calculate odds ratios and associations between stratified risk factors (such as income categories) and clustered and non-clustered participants. Multivariable logistic regression models were developed to determine associations between TB transmission (clustering), social and economic score and the other specified risk factors. Variance inflation factors were calculated to assess for collinearity between risk factors in multivariable regression models.
Discussion
The role of socio-economic factors in TB transmission remains a pertinent question in many high burden communities. In this study, based in a high TB burden community of generally low socio-economic status, we explored associations between socio-economic risk factors and Mtb strain clustering. Prolonged stay within this community was strongly associated TB transmission. Despite the high degree of homogeneity in the demographic characteristics of the study population at baseline, a higher proportion of clustered vs non-clustered cases had lower economic scores, although this was not statistically significant.
We analyzed economic risk factors for transmission, both individually and by creating a composite economic score. We observed a significant negative association between TB transmission and the number of household rooms used for sleeping in this study. Participants who reported having more than 3 rooms for sleeping were less likely to be part of a transmission cluster. This association may point to less close indoor contact time, particularly for lengthy overnight periods, hence a reduced risk of TB transmission for those who have more spacious or less crowded houses. Moreover, a trend towards individuals with lower income being more likely to be part of a TB transmission cluster was also noted. The number of participants earning salaries in the higher income category (>R5000; [±$350] per month) was very small and this may have reduced our power to show a statistically significant association, and further investigation of this finding is warranted. Taken individually, the remaining economic factors did not yield any strong statistical associations with TB transmission. Lower composite economic scores were noted in a higher proportion of clustered cases, although this was not statistically significant. Our findings are in agreement with other researchers who have reported that poor socio-economic conditions may predispose to TB transmission [
15,
30,
31]. But further, given the setting of a low economic community, these findings may hint at the possibility of a “sliding-scale effect of poverty” even in such communities, with individuals at the lower end of the economic scale being at potentially greater risk for acquiring TB infection. The factors that are linked to economic status, which in turn may explain this association are complex and may include poor nutritional status, poor living conditions and health status among other related and potential underlying factors [
10]. The questionnaire administered in this study did not enable us to explore these complexities in detail, which may in some part explain the lack of statistical associations. Our findings are in general agreement with other studies which have reported a socio-economic gradient between countries, within countries and even within communities [
12,
30].
In order to quantify social interaction and its possible associations with TB transmission, we created a composite social score. We found no overall association between TB transmission and the composite social score. However, we identified other individual-level factors associated with transmission. Specifically, both a longer stay in the same house and longer duration of living in the community were associated with belonging to a TB transmission cluster. These associations may be a measure of prolonged and persistent exposure to
Mtb in a community with a high burden of TB disease, with a higher effective contact rate and thus an increasing chance of acquiring TB infection for participants living in the community for longer periods of time. Although an intuitive finding, to our knowledge this is the first study to show that prolonged stay within a high burden TB community with high rates of ongoing TB transmission [
22] results in an increased risk of being part of a TB transmission cluster. A weak association was also noted between belonging to a transmission cluster and individuals who reported alcohol consumption in the past year and although we did not quantify alcohol consumption, there are plausible biological as well as social rationales for this finding.
While our results identified potential epidemiological links between TB transmission and socio-economic risk factors, we were surprised by the paucity of associations with many of the risk factors investigated, and with the composite social and economic scores. However, a study by Mathema et al. in South African gold-miners also could not establish any risk factors for TB transmission and this finding was posseted to be due to a universally high risk for disease in that population [
32]. Our findings in this study point to a similar scenario, with difficulty identifying specific transmission risk factors in a generally low socio-economic community with exceptionally high TB disease and transmission rates [
33,
34]. Some historical studies have reported the role of crowding and poor living conditions on the risk of TB transmission within households, and Andrews et al. have further suggested that targeted interventions among the poor may be one of the most effective interventions to reduce TB transmission [
35]; an approach that would be supported by our findings in this study.
While the inference of recent transmission of tuberculosis from clustered strains has a number of recognized limitations [
29] our interpretation is strengthened by supporting evidence of high Mtb transmission rates in the community [
22], the notable diversity of circulating strains [
28], the study duration and the discriminatory power of RFLP [
29]. Potential limitations for our study include information potential biases due to missing data. Firstly, participation in the study was voluntary; although recruitment was excellent with over 90% of eligible patients enrolled in the questionnaire component of the study. Secondly, we were not able to obtain genotyping data for all enrolled patients. We have previously reported few significant differences in patients with RFLP data and those without [
28]: of note multi-drug resistant TB (MDR-TB) patients were more likely to have RFLP data and patients who had died were less likely to have RFLP data. However, there was no statically significant difference between those patients with and without RFLP data in terms of age, gender, new versus retreatment TB or HIV or ART status [
28]. Missing specimen genotype data as well as the recognized limitations to the discriminatory power of RFLP [
29] may also have resulted in misclassification of apparent unique strains, with an underestimation of clustering. Another potential limitation in this study is that our sample size of 505 strains may have lacked power to detect small statistical differences. This could potentially explain the non-statistically significant trends for some of the risk factors analyzed in this study. In addition, the socio-economic combined scores used have not been validated. Further work to confirm these findings in larger populations across different populations could bring more definitive insights into the social and economic factors linked with TB transmission that would guide national policy guidelines in high burdened settings.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.