We have constructed a new wealth quintile by excluding drinking water as an asset of the households. The default asset scores and quintiles of DHS datasets were constructed by including drinking water supply as an input. However, drinking water is the dependant variable of this study and was excluded to construct wealth quintiles. An asset score for each household was constructed using Principal Components Analysis in Stata 14.0 [
34]. All surveyed households were ranked and divided into five subsets or wealth quintiles. The first quintile included the poorest 20% of households and the fifth quintile included the wealthiest 20%. Following the approach used by Jia et al. [
35] five subsets of clusters with GPS points representing five quintiles were created. Each subset included all the clusters that contained at least one household in the corresponding quintile. Since a single cluster was represented by a single GPS point and households in a single DHS cluster may fall in different quintiles, subsets of clusters were not mutually exclusive.
The raw coverage rate of unimproved water in each cluster was calculated as a proportion. The proportion was calculated as households with any of unimproved water sources to the total households in each cluster for the overall population and for each quintile. It has also accounted for the survey design and weight. The difference in raw coverage rates among the sampled clusters was statistically tested using one-way ANOVA.
A spatially smoothed rate was calculated to stabilize raw rates. To perform the smoothing, first, a Thiessen polygon which divides an area into regular sub-areas that encloses all locations closer to the central point than to any other point was created [
36]. Spatial smoothing was used to produce a corresponding estimate to the raw coverage rate of each cluster from a collection of neighboring clusters enclosed by Thiessen polygon. For this study, the first order Queen Contiguity was applied as the spatial smoothing rule. Queen Contiguity spatial smoothing rule considers all neighboring polygons sharing a common edge or a common vertex with the target Thiessen polygon as neighbors. The difference between spatially smoothed and raw coverage rates for the overall population and each quintile was also calculated by subtracting the raw coverage from spatially smoothed coverage rates.
Spatial autocorrelation was performed by joining the raw and spatially smoothed coverage data to the geographic coordinates based on DHS cluster identification code. We have assumed there is a complete randomness of unimproved water distribution in the study sites. Global spatial autocorrelation was performed to analyze whether the pattern of unimproved water coverage is clustered, dispersed, or random across the study areas. The Global Moran’s
I measure spatial autocorrelation based on the feature locations and attribute values. For a set of features with associated attribute, Global Moran’s
I evaluate whether the pattern expressed is clustered, dispersed, or random. When the
z score or
p value indicates statistical significance, a positive Moran’s
I index value indicates tendency toward clustering while a negative Moran’s
I index value indicates tendency toward dispersion. As the global spatial autocorrelation technique provides one quantitative value for the whole dataset, it cannot identify local clusters with high or low coverage. Thus, local spatial autocorrelation analysis was applied to detect local clusters for positive global autocorrelation results. Local Moran’s
I was used to calculate a test statistic for each location and to identify clusters of high and low coverage. A random permutation procedure (RPP) was used to replicate the statistics 999 times to generate reference distributions. The distribution of the test statistics was evaluated against a theoretical or random reference distribution generated. Local Moran’s
I was calculated for both raw and spatially smoothed rates. Both the global and local spatial autocorrelation was calculated using GeoDa [
37]. For POU water treatment, the number of households reportedly use adequate water treatment methods (chlorination, boiling, filtration, and SODIS) were considered a yes (1) and no otherwise (0 = if the household had used neither of them, i.e., households which had used either let it stand and settle, cloth straining, or never used any treatment option). Descriptive and logistic regression was used to assess the associated factors with the household POU treatment. A multivariable logistic regression was run to identify factors associated with POU water treatment practices by including variables with
p value < 0.25 from bivariate analysis.