Datasets and study sample
This study used 5 years’ worth of data from the Korea National Health and Nutrition Examination Surveys (KNHANES, 2010–2014). These surveys are conducted annually by the Korean Centers for Disease Control and Prevention, and solicit data from the general, non-institutionalized population through a stratified, multistage probability sampling design. These surveys are nationally representative datasets, containing abundant information on South Koreans’ demographics, socioeconomic status, health, and lifestyles. Of the 41,102 individuals in the 5 years of data, 22,456 were women. 16,877 of these were women aged 25 years or above who had most likely completed their formal education [
23]. Of these, we selected 16,391 women after excluding 486 women who were pregnant or breastfeeding at the interview date, for either status is likely to affect body weight. We then used 14,577 women participants with complete information as a study sample (88.9% of the total), because there was no evidence of statistical differences regarding important demographic characteristics between groups with and without complete information (
p-value = 0.187 for age and 0.275 for residential area). All KNHANES participants provided written consent to participate in the survey and for their personal data to be used. The data we used are publicly available, and the institutional review board of our organization provided ethical approval for our study.
Measures and variables
First, we calculated each participant’s body mass index (BMI) based on their height and body weight, which were measured through physical examinations and available from the KNHANES. In accordance with the guidelines proposed by the World Health Organization (WHO), and considering that Asians generally have lower BMIs on average [
24], we defined obesity as a body mass index of 25 or higher and modelled our dependent variable to equal one or zero according to whether a participant is obese or not, respectively.
Next, we measured participants’ education level. We defined a participant’s education level as the highest level of formal education they had completed as of the interview date. We then divided participants’ education level into two categories: a high school education or less (≤ 12 years of education), and attaining or working on a college degree or higher (≥ 13 years of education). Based on these categories, we divided our participants into a less-educated and a highly-educated group.
Independent variables include: age (25–34, 35–44, 45–54, 55–64, or 65 years and older); marital status (married or non-married, where non-married included never married, separated, widowed, or divorced); residential area (urban or rural); occupational status (employed or not employed, where not employed included those who had no paid work); household income (below median income, or median income or higher, where income was adjusted for household size by the square-root equivalence scale and median income was as defined by all participants’ information) [
25]; current smoking status (smoking or non-smoking); risk of alcohol intake (no or low, or medium or higher, according to WHO’s sex-specific guidelines for risk of acute problems from drinking) [
26]; walking exercise activity (active or inactive, according to whether a woman walks for at least 30 min per day at least 5 days per week) [
27]; and self-perceived stress level (stressed or not stressed).
Statistical analyses
In this study, we performed a six-fold analysis. First, we applied χ2-tests to determine whether the distribution in participants’ characteristics differed between our two groups.
Second, we examined whether age modified or confounded the relationship between education and obesity in women [
28]. Because age confounded the relationship in both the unadjusted and adjusted models, we included age as a confounder in the analysis without stratifying the analysis by age categories.
Third, we continued to re-categorize each of the characteristics and re-define each characteristic’s reference category differently until both strong multicollinearity and a lack of goodness-of-fit disappeared in each model, because the decomposition analyses are based on multivariate logistic regression models for each education group. As a result, we constructed final models whose variance inflation factor values were less than 2.4 and had p-values based on the Hosmer–Lemeshow statistic of 0.565 for the less-educated group and 0.790 for the highly-educated group.
Fourth, we estimated the predicted prevalence (%) of obesity (PPO) (and its 95% confidence intervals, or CIs) of participants for each characteristic, where participants’ PPO denotes the average value of predicted probabilities that each participant would be obese when she belongs to a specific category of a characteristic but her other characteristics remain the same. The PPO estimates helped us to compare 1) the adjusted prevalence of obesity of participants across different categories of each characteristic in the same education group, and 2) the adjusted prevalence of obesity of participants belonging to a specific category of a given characteristic between two education groups.
Fifth, in order to decompose the difference in obesity rates between the two groups and discern characteristics’ separate contributions to the relationship between education and obesity, this study used an extended Oaxaca-Blinder decomposition method [
29‐
31]. Following this method, we estimated the separate contribution of a certain observed characteristic (like the high proportion of women aged 25–34 years in our study) to the relationship between education and obesity by assigning these characteristics a percentage value. We then summed up all of the separate contributions to obtain “the contribution of overall composition effects.”
In addition, we noted that the association of a certain observed characteristic with obesity in women was sometimes different between our two groups, as suggested by the difference in the estimated coefficient of the characteristic between the two groups, which may account for the difference in obesity rates between the two groups. Therefore, we estimated the separate contribution of the differences in the association with being obese by education to the difference in obesity prevalence in the two education groups. We then summed over all such separate contributions to obtain “the contribution of pure association effects.”
In addition, we estimated the separate contributions of differences in the constant term coefficients between the multivariate logistic regression models for less- and highly-educated women to the difference in obesity rates between the two groups, which we call “the contribution of the group-specific effect.” Indeed, the contribution of the group-specific effect represents a contribution to the difference in obesity rates between the two groups that cannot be accounted for by all independent variables in each model under investigation, neither through any observed characteristic nor through its estimated coefficient. We then combined “the contribution of pure association effects” with “the contribution of the group-specific effect” and named it “the contribution of overall association effects.” To summarize, whereas the “overall composition effects” denote contributions due to the differences in observed characteristics between the two groups, the “overall association effects” denote the contributions due to the differences in the estimated coefficients and constant terms between the two groups when women’s obesity regressed in the observed characteristics of each group.
Finally, in order to explore changes in the contributions among models with different sets of independent variables, we constructed a hierarchy of three models and conducted three analyses. Model 1 uses demographic variables (age, marital status, and residential area) as independent variables. Model 2 uses socioeconomic variables (occupational status and household income) along with the independent variables used in Model 1. Model 3 uses lifestyle variables (smoking, risk from alcohol intake, walking exercise activity, and self-perceived stress) along with the independent variables used in Model 2.
We conducted all analyses with consideration for the complex survey design and set the statistical significance to an alpha level of 0.05. We used SAS 9.4 (SAS Institute, Cary, NC, USA) and STATA 15 software (StataCorp, College Station, TX, USA).