Introduction
Breast cancer is the most common cancer and one of the leading causes of cancer death among women throughout the world [
1]. Its incidence rate varies across regions, which is much higher in Western than in Asian countries [
1‐
5]. This regional variation has been postulated to be attributed to the differences in lifestyle and dietary factors. One of such dietary components is soy foods, which are staples in Asian diet but rare in diet of western populations. Soy foods are major sources of dietary isoflavones, which share structural similarities with 17-β-estradiol and may serve as functional estrogen antagonists to protect against breast cancer [
6,
7]. Also, soy isoflavones may exert effect through estrogen independent pathways [
6].
Epidemiological evidence on the relationship between soy foods and breast cancer is still limited and inconclusive [
8], despite potential mechanisms from experiments [
6,
7]. Several meta-analyses concluded that higher amount of soy intake was associated with lower risk of incident breast cancer among Asian women but not among their Western counterparts [
9‐
12]. However, between-study heterogeneity was obvious [
9‐
12] and most original studies included in these meta-analyses were case–control studies [
10‐
12]. By contrast, a recent meta-analysis involving only cohort studies, which are less subject to selection and recall bias, found no association between isoflavone intake and breast cancer, but it showed that a higher consumption of soy-based foods was weakly associated with a lower risk of breast cancer relative to a low consumption of soy foods (HR: 0.87, 95% CI 0.76–1.00) [
13]. The definition of “high” soy food intake varies across studies in previous meta-analyses, making it difficult to interpret the pooled risk estimates. Meta-analysis taking into account the amount of soy foods or isoflavone dose is desirable.
Several cohort studies or nested case–control studies related to soy and incident breast cancer gave inconsistent results. Studies conducted in Western countries where dietary soy intake levels were low [
14‐
24] or moderate [
25,
26] have found no clear association. Evidence from studies in Asia where dietary soy intake levels were moderate to high was also inconsistent. Some studies found no statistically significant association [
27,
28] while others reported a reduced breast cancer risk for women in the highest soy intake groups [
29‐
32]. As most previous cohort studies in Asia have limited numbers of breast cancer cases, insufficient statistical power as well as different cut-off values for soy intake categories may explain this inconsistency. To our knowledge, only one cohort study, namely the Shanghai Women’s Health Study (SWHS) which enrolled participants from a highly developed city, was conducted in China previously [
32]. More evidence from the general Chinese adults with diverse economic background is still lacking.
Using the data from the China Kadoorie Biobank (CKB), a large scale prospective cohort study involving over 300,000 women from 10 geographically and economically diverse regions in China, we evaluated the relationship between soy intake and risk of incident breast cancer. Subgroup analyses were also conducted to assess whether risk estimates varied by baseline characteristics like menopausal status. Participants’ soy consumption levels and cut-off values of soy categories varied across previous studies. In order to provide a clearer picture on the soy-breast cancer relationship, we also did a dose–response meta-analysis to integrate results of prospective cohort studies.
Methods
Study population
The CKB study recruited participants from 5 urban areas and 5 rural areas in China. The study design and methods have been described in detail elsewhere [
33]. Briefly, between June 2004 and July 2008, the study enrolled 512,715 adults (302,510 women) aged 30–79 years who completed baseline data collection, including a questionnaire and physical measurements after signing a written informed consent form. Among the women recruited at baseline, we excluded persons with previously cancer diagnoses (
n = 1610), with missing values for key variables (
n = 47 for reproductive characteristics, n = 1 for body mass index [BMI]), leaving 300,852 female participants in the main analyses. Approval of the study was obtained from ethics committees or institutional review boards at the University of Oxford, the Chinese Center for Disease Control and Prevention, the Chinese Academy of Medical Sciences, and all participating centers.
Assessment of soy consumption
In the baseline questionnaire, one item asked participants about the overall frequency of soy foods (e.g. soybeans, fresh tofu, fried tofu, pressed tofu, soymilk skin or film, soybean flakes, soymilk and so on) during the past 12 months: never/rarely, monthly, 1–3 days per week, 4–6 days per week, or daily. After participants completed the baseline survey, two resurveys were conducted in 2008 and 2013, respectively, each involving about 5% of the randomly selected participants from each of the 10 study regions. The first resurvey questionnaire asked exactly the same question on soy consumption as the baseline one, whereas the question was split into 2 items in the second resurvey. One item asked about the frequency and amount of soymilk consumption and the other item about soy foods other than soymilk. To evaluate the reproducibility and validity of food frequency questionnaires (FFQs) used in baseline and resurveys, an intensive dietary study of 432 CKB participants (254 women) was conducted from 2015 to 2016. These participants completed two FFQs (median interval: 3.3 months) and twelve 24-h dietary recalls (24-HDR). The 24-HDRs covered all kinds of soy foods in China and were conducted in three seasons separately, with 3 weekdays and 1 weekend day in each season. For female participants, the weighted Kappa statistic was 0.66 for reproducibility of baseline soy food frequency and 0.77 for reproducibility of soy consumption amount (excluding soy milk) in 2nd resurvey. Using 24-HDRs as the gold standard, the weighted Kappa statistic was 0.67 for validity of baseline food frequency and 0.74 for validity of soy consumption amount (excluding soy milk) in 2nd resurvey.
We combined intake frequency from baseline and 1st resurvey and the intake amount from the 2nd resurvey to estimate the usual amount of soy intake for each woman. Then, we converted the usual amount of soy intake into amount of soy isoflavone by taking into account the proportions that different kinds of soy foods contributing to total soy food amount among women in the 24-HDRs and the soy isoflavone content of different soy foods (Appendix).
Assessment of covariates
Information on socio-demographic characteristics (age, education and household income), lifestyle factors (alcohol consumption, physical activity and dietary habits), reproductive characteristics (i.e. age at menarche, parity, breast-feeding duration, menopausal status, menopausal age, and use of oral contraceptives), and family history of cancer were obtained from the baseline questionnaire. Daily energy intake was calculated by taking into account the 12 groups of foods available in the present study. A participant was considered as having a family history of cancer if at least one of their parents or siblings was diagnosed with cancer. At baseline, body weight and height were measured by trained staff using calibrated instruments and BMI was calculated by weight (kg)/height squared (m2).
Identification of breast cancer cases
Participants were followed-up from the date of completing baseline questionnaire to the date of diagnosis of breast cancer, death, loss to follow-up or 31 December 2016, whichever came first. By 31 December 2016, about 1% participants were censored due to loss to follow-up. Incident breast cancer cases were identified periodically through linkage with local disease and death registries [
34], and the national health insurance system or ascertained through active follow up. All diseases were coded according to the International Classification of Diseases, 10th Revision (ICD-10), by trained staff blinded to baseline information of participants. Breast cancer was coded as C50. The proportion of death certificate only cases was 4.1% for breast cancer, indicating the high completeness of cancer registration in the present study.
Statistical analysis
Baseline characteristics of participants were presented as means (standard deviations, SDs) or percentages across 4 categories of baseline soy consumption, standardized for age and study region if appropriate, via logistic regressions for categorical variables or multiple linear regressions for continuous variables.
Hazard ratios (HRs) and 95% confidence intervals (95% CIs) were calculated for breast cancer risk by both soy frequency and usual amount quartiles using Cox proportional hazard regression models. Categorical soy intake variables were treated as continuous variables to evaluate the linear trend. In the Cox models, age was used as the underlying time scale and baseline characteristics as covariates. Each Cox model was stratified by baseline age groups (in 5-year intervals) and study region. Potential confounding factors adjusted for in the multivariable models were: education attainment, household income, smoking status, alcohol consumption, physical activity, baseline BMI, standing height, age at menarche, parity, average breastfeeding duration, menopausal status and age at menopause, use of oral contraceptives, family history of cancer, total energy intake, and consumption frequency of fresh fruit, fresh vegetables, preserved vegetables, red meat, poultry, fish and dairy products at baseline. Information on hormone replacement therapy was not collected at baseline and thus was not adjusted for in the models. However, we believed it would not influence the risk estimates since < 1% of the females have ever used hormone replacement treatment according to data from the second resurvey. The method of Schoenfeld residuals found no violation of the proportional hazards assumption for soy intake in fully-adjusted model.
A stratified analysis was further performed separately among pre-menopausal and post-menopausal women to assess any modifying effect of menopause on the association of soy consumption and breast cancer. Besides, stratified analyses were conducted according to other baseline characteristics such as BMI. Multiplicative interactions were tested using likelihood ratio tests comparing models with and without the cross product terms between stratifying variables and soy food consumption categories.
Several sensitivity analyses were done separately: (1) by excluding females who developed breast cancer during the first 2 years of follow-up; or (2) by excluding those with a family history of cancer.
Systematic review and dose–response meta-analysis
We searched on PubMed, Embase and Cochrane library from their dates of inception to March 2019 for prospective studies examining the association between soy intake and breast cancer. Studies using concentrations of isoflavones or their metabolites in biological samples were not included owing to the difficulty in converting biological concentration into amount of soy isoflavone intake. However, we believed that this had no great impact on the result because studies using biological concentrations as exposure assessment were mainly conducted among populations with pretty low level of soy intake [
14,
16,
19]. We excluded studies if the amount of soy intake was unavailable and could not be estimated using relevant data. We also excluded reviews, non-human studies, abstract-only publications or editorials. If the same cohort study published more than one original articles on soy intake and breast cancer, the paper reporting the largest sample size, longest follow-up time, or the widest variation in soy intake levels was kept. In addition, studies in which participants’ soy intake levels were very low (i.e. mean or median intake < 5 mg/day of soy isoflavone for the highest consumption group) were not included in the quantitative synthesis (dose–response meta-analysis) as the weights contributed by these studies were negligible. Detailed literature searching strategy, data extraction methods, and quality assessment of individual study were presented in Appendix.
In the dose–response meta-analysis, HRs were considered as effect sizes and the median or mean soy isoflavone intake of each consumption category was regarded as the consumption dose of corresponding category. To test for potential non-linear relationship between soy isoflavone intake and incident breast cancer, a non-linear dose–response meta-analysis was done by coding soy isoflavone intake as a restricted cubic spline (RCS) function. If there was no evidence of non-linear association, we firstly estimated study-specific log HRs and 95% CIs for each 10 mg/day increment in soy isoflavone intake using the method introduced by Greenland and Longnecker [
35]. Then we pooled study-specific log HRs to obtain a summarized effect size using a fixed effect model. Between-study heterogeneity was assessed by I
2 statistic and the Egger test was used to detect publication bias.
Several sensitivity analyses were done to assess the robustness of dose–response meta-analysis: (1) dropping one of the studies included in the main dose–response meta-analysis each time; (2) dropping studies grouping participants according to soy frequency rather than amount of soy intake; (3) including only those studies which assessed soy intake in a more precise way (i.e. assessing both frequency and amount of soy intake using validated FFQs at baseline); or (4) including all the studies in the synthetic review, i.e. additionally including studies with extremely low level of soy isoflavone intake.
All statistical analyses were performed with Stata (version 15). All P values were two-sided and statistical significance was defined as P < 0.05.
Discussion
The CKB study, which enrolled over 300,000 women from 10 diverse regions from China, found no association between soy intake and incident breast cancer overall. The dose–response meta-analysis integrating the CKB study and other prospective studies from Asia and Western countries found that each 10 mg/day of soy isoflavone intake was associated with a 3% reduced breast cancer risk.
Findings from prospective studies on soy-breast cancer association were inconsistent, which may be due to different soy intake levels across different studies. Cohort studies conducted among women with low (< 5 mg/day in the highest consumption groups) [
15,
18,
20] or moderate (20–30 mg/day in the highest consumption groups) [
25‐
28] soy isoflavone intake found no clear association between soy and breast cancer risk. In the CKB study, the median amount of soy isoflavone intake in the highest consumption group was about 20 mg/day and no association was found between soy intake and breast cancer risk. This result was consistent with the four cohorts with moderate amount of soy intake [
25‐
28]. In four studies in which the highest quartile or quintile intake groups had soy isoflavone > 40 mg/day [
31,
32,
36] or the upper half intake group had a median intake at 23.5 mg/day (so upper quartile group is likely to have ~ 40 mg/day) [
30]), reduced breast cancer risk was found for women among the highest soy consumption group compared to women among the lowest consumption group.
The soy intake level (mean: 7.5 g/day of soybean equivalents) was much lower among the CKB women than that assessed in the study conducted in Shanghai, China (median ~ 25 g/day of soybean equivalents) [
32], which may be partially explained by the relatively lower level of soy intake among general Chinese adults than that among women in Shanghai. According to the three Chinese National Nutrition and Health Surveys conducted between 1992 and 2012, the mean daily intake of soy foods among Chinese adults was about 10–15 g of soybean equivalents, and remained stable throughout the past two decades [
37]. The mean daily soy intake was slightly lower among CKB women compared with results from the three Chinese national nutrition and health surveys, while the average soy intake in women of the Shanghai study was much higher than general Chinese population.
We found an inverse association between higher soy food frequency and breast cancer among CKB women with lower BMI while no statistically significant association was observed among women with higher BMI. The Singapore Chinese study also reported different risks estimates among women with different BMI [
30]. However, tests for multiplicative interaction between soy intake and BMI were statistically insignificant in both studies. Therefore, it is likely that different risk estimates for soy intake among women with different body sizes were due to chance.
The present dose–response meta-analysis observed a 3% (95% CI 1–5%) reduced risk of breast cancer for each 10 mg/day increment in soy isoflavone intake. Most of the cohort studies included in the present dose–response meta-analysis used validated FFQs to assessed soy intake. Most studies were of high quality according to the NOS scores which evaluated studies in terms of exposure and outcome measurement, confounding adjustment, follow-up duration and so on. Sensitivity analysis by excluding the study [
27] which was of lower quality observed no change in the risk estimate, and sensitivity analysis by including exclusively the five studies that assessed both the frequency and amount of soy intake at baseline gave similar risk estimate, indicating the robustness of the present dose–response meta-analysis. Besides, between-study heterogeneity was low in the main dose–response meta-analysis as well as in sensitivity analyses, and the Egger test found no evidence of publication bias, indicating the reliability of the dose–response meta-analysis. In the present dose–response meta-analysis, the breast cancer risk reduced by each 10 mg/day of soy isoflavone intake as weakly as 3%, which may explain why most individual studies (including CKB) with low to moderate level of soy isoflavone intake failed to detect a statistically significant soy-breast cancer association.
According to the Dietary Guidelines for Chinese Residents in 2016, the recommended daily amount of soy food intake for adults were 15–25 g soybean equivalents [
37]. The isoflavone content for 15–25 g/day soybean equivalents of different soy foods range between 10 and 50 mg/day (Appendix Table 8). Therefore, women consuming soy foods at the amount recommended by the Dietary Guidelines may be at 3–15% reduced risk of breast cancer according to the result of the present dose–response meta-analysis.
Could soy isoflavone supplement help? Two Western studies assessed soy supplement or soy isoflavone supplement, and neither of them found association of soy or soy isoflavone supplement with overall breast cancer risk [
38,
39], though one study found that current soy isoflavone supplement was associated with reduced risk of estrogen receptor positive (ER+) breast cancer and increased risk of ER- breast cancer [
39]. However, the dose of isoflavone in the supplements could vary in the study [
39].
As the largest cohort study on soy-breast cancer association, the CKB study has several strengths, including large sample size, population from diverse areas across China, prospective cohort design, long duration of follow-up, unified method of exposure assessment across study regions, and stringent quality control of data. Our meta-analysis included only prospective cohort studies, which minimized recall bias.
There are several limitations, nevertheless. Firstly, the baseline questionnaire asked CKB participants about soy intake frequency rather than intake amount, and thus the usual amount of soy isoflavone was estimated by combing information from baseline surveys, two resurveys and the 24-HDRs. More large scale prospective studies using more precise exposure measurement methods are warranted to verify findings from the CKB study. Secondly, some energy-providing food items such as oil were not assessed in the CKB study, and therefore the calculated total energy was lower than the actual total energy intake. However, the risk estimates in the CKB study remained stable before and after adjustment for the calculated total energy intake. Thirdly, data on hormone receptor status of breast cancer were unavailable in the CKB study, so we were unable to assess the association of soy intake with breast cancer subtypes. Existing prospective evidence of soy-breast cancer subtype association has been scarce and of low statistical power due to small sample sizes [
30,
32,
39]. Lastly, owing to the observational nature of studies included in the dose–response meta-analysis, we could not rule out the possibility of residual confounding caused by unmeasured factors. Large randomized controlled trials (RCT) may be desirable in causal inference but the feasibility of a large RCT on this issue is still a sticky issue.
Acknowledgements
The chief acknowledgment is to the participants, the project staff, and the China National Centre for Disease Control and Prevention (CDC) and its regional offices for assisting with the fieldwork. We thank Judith Mackay in Hong Kong; Yu Wang, Gonghuan Yang, Zhengfu Qiang, Lin Feng, Maigeng Zhou, Wenhua Zhao, and Yan Zhang in China CDC; Lingzhi Kong, Xiucheng Yu, and Kun Li in the Chinese Ministry of Health; and Sarah Clark, Martin Radley, Mike Hill, Hongchao Pan, and Jill Boreham in the CTSU, Oxford, for assisting with the design, planning, organization, and conduct of the study. We sincerely appreciate Chenxi Qin for her help and support in issues about dietary assessment. We thank Gertraud Maskarinec and AH Wu for providing data for the present dose–response meta-analysis.
Contributors
Canqing Yu and Dezheng Huo conceived and designed the study. Liming Li, Zhengming Chen, and Junshi Chen, as the members of CKB steering committee, designed and supervised the conduct of the whole study, obtained funding, and together with Jun Lv, Yu Guo, Zheng Bian, Huaidong Duo, Ling Yang, Yiping Chen, Xi Zhang and Tao Wang authors acquired the data. Yuxia Wei and Meng Gao analyzed the data. Yuxia Wei wrote the first draft of the manuscript. CanqingYu and Dezheng Huo contributed to the interpretation of the results and critical revision of the manuscript for important intellectual content. All authors reviewed and approved the final manuscript. Canqing Yu and Dezheng Huo are the guarantors.
Members of the China Kadoorie Biobank collaborative group
International Steering Committee: Junshi Chen, Zhengming Chen (PI), Robert Clarke, Rory Collins, Yu Guo, Liming Li (PI), Jun Lv, Richard Peto, Robin Walters. International Co-ordinating Centre, Oxford: Daniel Avery, Ruth Boxall, Derrick Bennett, Yumei Chang, Yiping Chen, Zhengming Chen, Robert Clarke, Huaidong Du, Simon Gilbert, Alex Hacker, Mike Hill, Michael Holmes, Andri Iona, Christiana Kartsonaki, Rene Kerosi, Ling Kong, Om Kurmi, Garry Lancaster, Sarah Lewington, Kuang Lin, John McDonnell, Iona Millwood, Qunhua Nie, Jayakrishnan Radhakrishnan, Paul Ryder, Sam Sansome, Dan Schmidt, Paul Sherliker, Rajani Sohoni, Becky Stevens, Iain Turnbull, Robin Walters, Jenny Wang, Lin Wang, Neil Wright, Ling Yang, Xiaoming Yang. National Co-ordinating Centre, Beijing: Zheng Bian, Yu Guo, Xiao Han, Can Hou, Jun Lv, Pei Pei, Chao Liu, Yunlong Tan, Canqing Yu. 10 Regional Co-ordinating Centres: Qingdao CDC: Zengchang Pang, Ruqin Gao, Shanpeng Li, Shaojie Wang, Yongmei Liu, Ranran Du, Yajing Zang, Liang Cheng, Xiaocao Tian, Hua Zhang, Yaoming Zhai, Feng Ning, Xiaohui Sun, Feifei Li. Licang CDC: Silu Lv, Junzheng Wang, Wei Hou. Heilongjiang Provincial CDC: Mingyuan Zeng, Ge Jiang, Xue Zhou. Nangang CDC: Liqiu Yang, Hui He, Bo Yu, Yanjie Li, Qinai Xu,Quan Kang, Ziyan Guo. Hainan Provincial CDC: Dan Wang, Ximin Hu, Jinyan Chen, Yan Fu, Zhenwang Fu, Xiaohuan Wang. Meilan CDC: Min Weng, Zhendong Guo, Shukuan Wu,Yilei Li, Huimei Li, Zhifang Fu. Jiangsu Provincial CDC: Ming Wu, Yonglin Zhou, Jinyi Zhou, Ran Tao, Jie Yang, Jian Su. Suzhou CDC: Fang liu, Jun Zhang, Yihe Hu, Yan Lu,, Liangcai Ma, Aiyu Tang, Shuo Zhang, Jianrong Jin, Jingchao Liu. Guangxi Provincial CDC: Zhenzhu Tang, Naying Chen, Ying Huang. Liuzhou CDC: Mingqiang Li, Jinhuai Meng, Rong Pan, Qilian Jiang, Jian Lan,Yun Liu, Liuping Wei, Liyuan Zhou, Ningyu Chen Ping Wang, Fanwen Meng, Yulu Qin, Sisi Wang. Sichuan Provincial CDC: Xianping Wu, Ningmei Zhang, Xiaofang Chen,Weiwei Zhou. Pengzhou CDC: Guojin Luo, Jianguo Li, Xiaofang Chen, Xunfu Zhong, Jiaqiu Liu, Qiang Sun. Gansu Provincial CDC: Pengfei Ge, Xiaolan Ren, Caixia Dong. Maiji CDC: Hui Zhang, Enke Mao, Xiaoping Wang, Tao Wang, Xi zhang. Henan Provincial CDC: Ding Zhang, Gang Zhou, Shixian Feng, Liang Chang, Lei Fan. Huixian CDC: Yulian Gao, Tianyou He, Huarong Sun, Pan He, Chen Hu, Xukui Zhang, Huifang Wu, Pan He. Zhejiang Provincial CDC: Min Yu, Ruying Hu, Hao Wang. Tongxiang CDC: Yijian Qian, Chunmei Wang, Kaixu Xie, Lingli Chen, Yidan Zhang, Dongxia Pan, Qijun Gu. Hunan Provincial CDC: Yuelong Huang, Biyun Chen, Li Yin, Huilin Liu, Zhongxi Fu, Qiaohua Xu. Liuyang CDC: Xin Xu, Hao Zhang, Huajun Long, Xianzhi Li, Libo Zhang, Zhe Qiu.