Background
Overweight and obesity are severe problems worldwide, causing a number of diseases such as type 2 diabetes, and thus reducing expected life years and quality of life [
1,
2]. In Germany, for example, the prevalence of being overweight or obese among adults was 54.0% according to the GEDA (GEDA, German Health Update) study from the Robert Koch Institute (a national public health institute in Germany) in 2014/2015, with men being affected more often than women [
3]. Other personal aspects affecting obesity besides gender were low education and higher age according to Mader et al. [
4]. In addition, several German cohort studies have shown that the average weight in middle-aged populations increased slightly during recent years [
5].
Obesity has become a major public health concern, and recent studies describe regional heterogeneity [
6,
7]. In obesity-related research, the term “obesogenic environment” describes environmental influences such as green space or fast food restaurants on the development of obesity [
8,
9], which has been investigated intensively in the past [
10]. Several approaches have been developed in order to analyze the effect of the environment on the risk of obesity. Examples include obesogeneity assessment via questionnaires [
11] and via data visualization tools for obesity policy [
12].
Some geographic modeling approaches were used to characterize the accumulation of environmental factors. Common techniques include kernel density estimation (KDE), a density method that allows for the estimation of a continuous risk surface [
13,
14], as well as hot spot mapping [
15] and further geographic information system (GIS) methods [
16]. These methods can be used to develop obesity risk scores [
17].
Online geocoding services offer low-cost geographic data for researchers that can be downloaded and used for spatial statistical analyses. Their validity has been investigated in the past with reasonable results regarding completeness of environmental factors and positional accuracy of their coordinates [
18,
19]. Therefore, they offer a rich database on which geographic tools can be built. However, geocoding services such as Google Maps offer data only in a limited way. In contrast, geodata from OpenStreetMap (OSM) contain geographic information provided by volunteers and thus are less restricted [
20]. In a recent study, we performed an extensive literature search to identify obesity-related environmental factors [
18]. Furthermore, we operationalized and downloaded corresponding points of interest (POIs).
The aim of our study extends this approach by developing and testing the spatial obesity risk score (SORS) based on data from OSM. The SORS calculates the obesity risk for each geographic point in a given region based on the local density of positive and/or negative obesity-related environmental factors. In our study, we developed a methodological framework for risk score estimation using KDE and tested the influence of five KDE parameters on the SORS values: (1) bandwidth, (2) edge correction via the size of the download area, (3) number of grid points, (4) risk interpolation method, and (5) weighting scheme of the environmental factors.
Discussion
We developed the SORS based on KDE using freely available data from online geocoding services. We tested several parameters which could potentially influence the final score values. Our tests showed that the SORS depended on the choice of bandwidth and the amount of edge correction applied to the KDE; the latter, however, for only one of the two study areas. In contrast, the interpolation method, the numbers of grid points, and an alternative weighting scenario had a small influence on the results.
The SORS was calculated by taking the difference of the positive and the negative kernel density surface. We followed a similar approach to that in the work of Jones-Smith et al. [
34]. They estimated correlations of their score with obesity. In contrast, the aim of our study was to investigate the effect of parameter variation on the robustness of our score. Furthermore, we covered an extensive list of obesogenic and protective environmental factors that expanded the approach of a food score to a more comprehensive measure, also including the physical activity environment. Some alternative approaches using KDE were based on the division of the kernel density surfaces [
54]. A major drawback of these quotients is the issue with division by zero leading to values approaching infinity and thus leading to instability. Our approach to the SORS avoids this and also the need for some adjustments correcting for the instability.
Previous studies used various approaches to estimate risk scores based on kernel techniques, both in obesity-related research areas and elsewhere. Fitzpatrick and colleagues [
55], for example, developed the keeping score based on KDE to characterize crime patterns, which has often been used by the police. Crime heat maps can be generated with this technique. This approach is based on the locations of past events instead of geolocated environmental factors, and the authors assumed that the pattern of these historic events would be maintained in the future.
Some studies created kernel density surfaces based on POIs and extracted density estimations from these surfaces in order to investigate the association with weight status. Rundle et al. (2009) analyzed the effects of environmental factors on body mass index (BMI). Results of KDE analysis concerning healthy and unhealthy food outlets were used to classify the neighborhood environment of each individual within the study based on a quintile approach [
56]. Furthermore, walkability, land use mix, and population density were considered. These variables could not be implemented in our study based on the chosen POI approach with OSM data.
The five chosen SORS parameters, bandwidth, edge correction, grid points, interpolation, and weights, have also been investigated in the literature. Laraia et al. (2017) used a business software and ArcGIS to geocode the information from the study data [
57]. As in our analysis, several bandwidths were tested within their KDE approach, which was found to be a sensitive model parameter. Similarly, we also found a fundamental influence of bandwidth on the results.
Effects at the edge of the study area were estimated in a simulation study concerning cluster models for food outlets [
58]. Estimations at the boundaries were biased, and the authors came to the conclusion that edge effects should be corrected in studies considering measures of availability and accessibility. This underlined the importance of edge correction, which was also a major topic in our study. In addition, extending the study area has been proven to be a valuable edge correction method.
Finding the optimal number of grid points was also discussed in the literature. Some authors suggested that a choice between 100 and 500 grid points gives reasonable results [
59]. In our analysis, we chose 25 × 25 points for the minimum bounding rectangle, i.e., 625 grid points, and chose some additional amount of edge correction for the base case. In addition, we performed some adjustments to preserve the distance between the grid points for the edge correction scenarios. In this case, the number of grid points was extended proportionally to the amount of edge correction applied, i.e., to the amount of study area extension. This made it possible to analyze grid point and edge effects separately. The choice of grid points in our base case and sensitivity analysis was chosen in accordance with default grid sizes implemented in KDE packages.
An inverse distance weighting method was applied in the past in KDE estimation regarding homicide locations as a parameter of area safety [
60]. This method could be used to estimate effects at specific locations. We used such an inverse distance method in our model as an alternative to the automatic interpolation function of the base case. As a further common method, linear interpolation has been applied within the literature [
61]. The “interp.surface” function applied to our SORS model was based on bilinear weights.
It was challenging to find a suitable weighting scheme applicable within our analysis. For the base case, we assumed that each factor has the same positive or negative weight, although this might look different in reality. Additionally, we tested an example from the literature [
34]. We found that double weighting of supermarkets and physical activity facilities had little effect on the results. Owing to several possible weighting methods for spatial POIs, it is necessary to test further alternatives within future studies.
Finally, the SORS was graphically compared to a risk score that was derived from incremental intensities of inhomogeneous spatial point processes. Although the methodology applied here changed from KDE-based to intensity-based estimations, similar visual patterns could be derived from the two score approaches for protective patterns, which further underlines the robustness of our chosen algorithm.
Implications of the SORS on obesity-related research and policy
The SORS is a helpful tool to understand the spatial distribution of health-related harmful environmental factors in relation to health-promoting environmental factors. Risk score maps allow for an overall intuitive view on summarized structures, which can be a valuable help in obesity-related research and also within policy. Although the actual use of those structures might look different in reality, it nevertheless gives a composite simplifying measure of the environment and can be further extended to a more comprehensive tool accounting for several health dimensions affecting individuals simultaneously.
Strengths and limitations
Several strengths exist regarding our study. The automated processing of data and the automated testing of several important KDE parameters makes it possible to repeat the application of risk score estimation for other areas efficiently, given that the spatial data points and the shape files of the city or town boundaries have been downloaded before. This enables the user to describe, compare, and monitor (if done repeatedly) risk scores as well as the influence of relevant risk score parameters within several areas of interest, within other regions worldwide, and also on a larger geographic scale. For example, the analysis could be performed for a whole country in order to identify national inequalities regarding environmental obesity risks or to guide and prioritize prevention efforts that concentrate on the food and the physical activity environment. To achieve this on a regional scale, the data download area simply has to be increased to cover a larger area for the subsequent data download from OSM. The data files would be of a manageable size, as only a small number of features are important for this kind of analysis. For Augsburg, i.e., for the larger of our two study areas, the data file size was 8 MB. For larger areas, e.g., for Germany, other portals such as Geofabrik should be used. In this case, no query process is needed, and the data files are directly ready for download. The data size for Germany, for example, would be 3.1 gigabytes in this case [
62]. Furthermore, using so-called planet OSM files, data disk space of around one terabyte (compressed 89 GB) or less is required [
63].
We integrated uncertainty into our analysis by performing a spatial bootstrap. Subsequently, we used the samples directly for the evaluation of our method. This allowed us to assess the stability of the score values against POI variations and helped us to compare deterministic parameter scenarios based on the ANOVA F statistic. On the one hand, the impact of each parameter on score results could be assessed. In addition, the values of the F statistic could be used to find optimal parameter combinations for the SORS.
We checked the robustness of the score and repeated our analysis several times for a given area. Results were qualitatively equivalent, i.e., for each given parameter variation, the repeated analysis could be used to rank the scenarios in the same order.
However, there are also some limitations regarding the study. First, some of the environmental factors discovered during the literature search could not be implemented based on spatial POIs, especially complex constructs such as land use mix or walkability.
Second, the categorization of positive and negative obesogenic factors was based on data from pre-existing literature, and it is not known whether POIs categorized as “positive” or “negative” are really positively or negatively associated with obesogenic health (behavior). Further studies could compare the SORS with external data sources, such as walk scores in a given region, in order to test these associations [
64].
As the content of OSM is generated by users, it is necessary to assess the data quality within validation studies. Within our previous work, we calculated sensitivity, specificity, and positive predictive values for OSM and compared the results with the corresponding values for Google Maps [
18]. It became evident that both geocoding services performed adequately. OSM had higher positive predictive value but, in contrast, lower sensitivities than Google Maps.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.