Introduction
Tuberculosis (TB) is commonly transmitted outside of the home in community-based settings during social contact between infectious cases and susceptible community members [
1,
2]. Prior studies have used questionnaires to identify epidemiologic links between unrelated TB cases and have found spatial links between cases in community settings [
3‐
6]. Identifying spatial areas of transmission is important for contact tracing and infection control, as epidemiologic links between TB cases are often unclear [
7,
8]. Moreover, targeting infection control in geographic hotspots of TB transmission may reduce overall levels of community transmission [
9].
Activity spaces are used in epidemiologic studies to represent geographic spaces wherein people spend their time during regular daily activities [
10‐
12]. Past studies of human activity spaces have relied predominantly on place-tracing questionnaires to delineate activity spaces [
3‐
5,
13]. More recent approaches now leverage global positioning systems (GPS) [
10,
14,
15]. The widespread availability of GPS technology has facilitated fine scale study of human movement patterns and are less prone to recall and measurement errors common in retrospective place-tracing interviews [
16‐
19].
The goal of this study was to compare the activity spaces of multidrug resistant tuberculosis (MDRTB) cases and healthy (TB-free) community controls, identify areas of activity space overlap among clustered cases to identify areas of potential transmission, and to quantify the association between activity space overlap and genetic clustering of MDR Mycobacterium tuberculosis (Mtb) strains.
Methods
Study design and setting
Study participants were recruited from the areas of Callao and Lima Sur located to the north and south of Lima, Peru, respectively. These two regions report the greatest proportion of incident MDRTB cases in Peru [
20]. Callao has an area of 147 km
2 with nearly all one million residents living in urban areas. Lima Sur encompasses 11 districts with a total area of 852 km
2 and 1.5 million residents.
Between February 2016 and May 2017, patients were recruited from a completed parent study that enrolled a household-based cohort of MDRTB patients between 2010 and 2013 [
21]. Sputum samples from cases were taken at diagnosis and processed on liquid microscopic observation drug susceptibility assays (MODS) and solid Ogawa media. Aliquots of positive sputum samples were reserved for DNA extraction and genotyping via whole genome sequencing [
22]. The single nucleotide polymorphism (SNP) calling analysis was performed on an Illumina HiSeq2000 with paired-end reads of length of 100 bp [
23]. A pairwise matrix of MDRTB cases and the number of single nucleotide polymorphism (SNP) differences in their Mtb strains was assembled; cases with Mtb strains within ≤5 SNP differences were considered genetically clustered [
24]. This threshold was our working definition for MDRTB transmission. Exclusion criteria included genetically clustered pairs from the same household. Where a case was genetically clustered to multiple cases from the same household, only one member of that household was enrolled.
We enrolled community controls, who verbally confirmed that they had never received TB treatment or diagnosis, as a comparison sample. Controls included community health workers and nurses that worked at community health posts serving case neighborhoods. Additional controls were referred by community health workers and sourced from churches, restaurants, communal kitchens and education centers located in case neighborhoods. Controls were frequency matched to cases on age (± 5 years), sex (male/female) and study region (Callao/Lima Sur) to ensure comparability across these variables.
Informed consent was obtained from participants prior to data collection. The study protocol, consent forms and data collection instruments were reviewed and approved by the Institutional Committee of Ethics for Humans at La Universidad Peruana Cayetano Heredia.
Data sources and measurements
Questionnaires were used to collect demographic information from participants during face-to-face interviews. We used Qstarz BT-Q1000XT (Qstarz International, Taipei, Taiwan) GPS loggers to gather data on participant’s movements over seven days of observation. The units were configured to log participant’s locations (i.e., geocoordinates) every minute. Consenting participants were given a GPS logger and instructed to keep the logger powered on, carried with them at all times, and recharged nightly. Study nurses called participants every other day during the 7-day data collection period to remind them to carry GPS loggers.
Constructing activity spaces
Spatial ecologists have developed a suite of methods to study movement patterns of wildlife using GPS technology to delineate areas of regular space use called ‘home ranges’ and are analogous to activity spaces in human research [
25]. These methods account for non-uniform space use by representing home ranges (i.e., activity spaces) as spatial probability density functions of space use called utilization distributions (UD) [
26]. Instead of assuming space-use is uniform across an activity space, UDs highlight areas of concentrated activity with probability contours and are better representations of space use. In this study, we use home ranges to represent participant activity spaces.
Kernel density estimation (KDE) was used to construct participant activity spaces [
25,
27]. We used iterative visualizations of GPS kernel densities generated with a Gaussian kernel and bandwidths between 100 m and 1200 m to identify UDs that provided adequate smoothing of raw GPS locations while highlighting distinct “peaks” of areas where locations were concentrated. The final chosen bandwidth was 950 m.
Home range (i.e., activity space) sizes were estimated at 50, 95 and 99% contours of each participant’s UD. These percentages correspond to the smallest home range area encompassing 50, 95 and 99% of a participant’s GPS locations. The 95% contour is the standard used in home range studies, while the 50% contour is considered the “core area” of activity [
28,
29]. The 99% contour is the most inclusive contour, containing areas of sparse activity.
Measuring spatial overlap
After estimating the UDs to represent each participant’s activity space, we calculated the utilization distribution overlap index (UDOI) to quantify the amount of activity space spatial overlap between participants [
26]. The UDOI is estimated as the cumulative sum of the cell-by-cell product of two participant UDs multiplied by the intersecting area (i.e., product of two UDs) [
26,
28]. The UDOI of two participants is high when their GPS locations are concentrated within the same space [
26]. The UDOI ranges from 0 (no spatial overlap) to 1 (complete spatial overlap) and can take on values > 1 if the UDs are non-uniformly distributed and have a high degree of overlap. We estimated the UDOI’s at each home range contour level (50, 95, and 99%) to examine the magnitude of association between MDRTB transmission and activity space overlap. We created UDs for each participant and estimated the UDOIs for all pairs of participants using the ‘adehabitatHR’ package [
30] in R (version 3.6.1, The R Foundation).
Statistical analyses
T-tests, Wilcoxon rank sum tests (when appropriate), and chi-squared tests were used to compare cases and controls by demographics, home range size and mean UDOI.
We compared the mean pairwise UDOI of cases and controls to determine the degree of spatial overlap within and between groups. The mean UDOI of case dyads (i.e., case-case pairs), control dyads (i.e., control-control pairs), and case and control dyads (i.e., case-control pairs) were evaluated. Bonferroni adjusted P-values were reported to account for multiple comparisons.
We used logistic regression to estimate the odds of being genetically clustered as a function of spatial overlap (among cases). Linear regression was used to assess the relationship between the UDOI of case pairs and degree of genetic strain similarity (i.e., SNP differences). SNP difference values were log (base 10) transformed and logit transformed UDOI values were used to meet normality assumptions for modelling.
Discussion
In this study, we compared the activity spaces of MDRTB cases to healthy community controls. We found that cases had significantly smaller activity spaces than controls, that the activity space overlap was greatest among genetically clustered cases, and that there was a statistically significant association between activity space overlap and Mtb strain genetic similarity. These findings suggest that MDRTB contact, exposure and transmission may be occurring among cases in relatively small, overlapping activity spaces in community settings.
The demonstration of high overlap amongst genetically clustered cases and lower overlap between cases and controls in the dyad analysis suggests that spatial segregation between the groups may be occurring, as found in a previous study [
31]. Spatial segregation of MDRTB cases may support the use of spatially targeted screening interventions to improve local control and indirectly reduce MDRTB prevalence [
32]. The high spatial overlap among cases may explain why spatial clustering of MDRTB genotypes has been previously found in localized hotspots in Lima [
2]. The association between Mtb genetic similarity and spatial overlap was observed at all home range contours but was particularly pronounced at the smaller 50% home range contour (median of 5km
2), suggesting that transmission of MDRTB may be occurring at very local levels near case residences. Yang et al. found that genetic similarity of Mtb strains of MDRTB case pairs in China increased as their residential proximity increased [
33]. A phenomenon of MDRTB spillover from a prison in Lima to the surrounding population was demonstrated by Warren et al. and proposes a mechanism whereby local transmission may be occurring in the community [
34]. Moreover, as clustered cases tended to have substantially smaller activity spaces than non-clustered cases, this suggests that movement and spatial overlap in small geographic hotspots may be driving MDRTB transmission in this population. Focusing infection control in those areas of high overlap may reduce MDRTB transmission and result in community-wide benefits as suggested by Dowdy et al. [
9]. It is important to recognize, however, that drivers of transmission hotspots are likely to vary between rural and urban environments. Nelson et al. demonstrated that transmission of extensively drug resistant TB (XDRTB) was likely occurring far away from cases’ residences in rural South Africa. The authors suggest that cyclical rural-to-urban migration for work was an important determinant of transmission (Nelson et al. 2018). Shift of XDRTB transmission towards workplaces in urban centers in Durban was also highlighted by Peterson et al. [
13].
To date, studies investigating transmission sites for TB have relied on questionnaires about frequented locations [
6], which generally underestimate spatial mobility and are subject to information bias [
16,
18]. This study uses GPS tracking to obtain objective spatial information on participants and does not rely on participants’ recall. Activity spaces of individuals, calculated using GPS logging data, were shown to be larger than those derived from geotagging venues reported in questionnaires [
35]. Moreover, place-tracing questionnaires often focus only on community venues and do not take into account routes travelled between them [
35]. This suggests that GPS methods are superior in acquiring a greater amount of spatial information. This methodology is easily reproducible and demonstrates the utility of GPS tracking in combination with whole genome sequencing to identify potential transmission sites.
There are several limitations to our study that should be noted. Firstly, it is likely that selection bias was introduced through our non-random selection of controls. As this was an exploratory study, our sample of controls were convenience-based and were often health workers or their family and friends. As a result, they generally had a higher level of education and income and were not necessarily representative of the general population. Moreover, given the higher socioeconomic status (SES) of controls, it is likely that controls did not live in the same neighborhoods nor frequent the same shops and venues that cases did, which may have resulted in the observed low UDOI between case and control dyads (but the relatively higher UDOI among control-dyads). We attempted to address this source of bias through matching controls by age, sex and study region, but this was not sufficient to control for confounding caused by SES. Our healthy controls were also recruited on the basis that they had never been treated or diagnosed with previous TB. However, this was not confirmed by medical records and may represent another potential source of bias. Our analysis of genetic clusters relied on a SNP difference threshold to determine which cases were genetically clustered, representative of recent transmission [
24]. While a small SNP difference between two
M. tuberculosis strains is generally regarded as evidence of transmission, the appropriate clustering threshold depends on the environment and setting [
36]. While GPS monitoring is considered to be more precise than structured interviews at identifying activity locations, GPS locations alone do not provide context for locations (i.e., types of places or reasons for visiting areas) that interviews could elicit; combining the two forms of data collection is preferable [
15,
16]. On average, study participants provided slightly less than seven days of GPS location data, so these movement patterns might not be representative of typical activity. However, prior studies have found that human movement patterns tend to be regular and stable, particularly in urban settings where routines are structured and people tend to spend significant amounts of time in few, regularly visited locations [
18,
25,
37].
The small sample size was another limitation to the study. This was mainly due to the large numbers of potential participants who had moved or had unfortunately died. The findings of this study can therefore only be regarded as exploratory in nature.
A larger prospective study with a bigger sample size would be useful to confirm our findings and determine whether the differences between cases and controls, in terms of activity space size and overlap likelihood at all activity space contour sizes, are statistically meaningful. Potential sources of bias could be reduced through the recruitment of confirmed TB-free controls from cases’ neighbourhoods. A detailed investigation into community venues within overlapping activity spaces of genetically-clustered participants is essential to isolate specific areas where transmission is occurring. Additionally, follow-up questionnaires conducted alongside GPS telemetry would be useful to characterise the context of visits to community venues. Data collection could be made simpler and more effective by using alternative sources of GPS data, such as Google maps location history on participants’ smartphones. Data on the movement patterns of TB patients during the period of transmission itself may provide greater insight into specific locations where transmission may have occurred.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.