Background
A woman’s reproductive life course includes her age at menarche and menopause, the age at which she starts and stops having children and the number of children she has, as well as the age she first has sexual intercourse and the number of sexual partners she has in her lifetime. Some of these reproductive factors have been identified as risk factors for chronic diseases, including breast cancer, respiratory disease and cardiometabolic diseases [
1]. A younger age at menarche and older age at menopause were associated with an increased risk of breast cancer in one large meta-analysis [
2], while having fewer children and a higher age at first birth (AFB) were positively associated with breast cancer risk in another [
3]. Other studies have implicated age at menarche, AFB, number of still births and miscarriages, age at menopause and parity in relation to respiratory and cardiovascular disease [
4‐
6]. One study found that later age at menarche was associated with a reduced risk of coronary artery disease [
7]. Having any children and later AFB have been associated with a lower risk of lung cancer [
8]. Older age at menarche and a shorter reproductive period have also been linked with a higher risk of chronic kidney disease [
9,
10].
However, on the whole, studies have not considered a woman’s entire reproductive history and the potential interplay between reproductive factors. Understanding the inter-relationships between reproductive factors is important to correctly identify potential confounders (common causes of the exposure and outcome of interest) and mediators (factors that lie on the causal pathway between exposure and outcome). Information on multiple reproductive factors will provide useful additions to algorithms for predicting disease risk in women [
1].
Evidence of association between age at menarche and menopause is inconsistent, with some studies reporting earlier age at menarche associated with earlier menopause [
11‐
16], others showing the inverse association [
17,
18] and some showing no evidence of this association [
19‐
24]. While there is some evidence of an association between an earlier age at menarche and earlier AFB [
25,
26], there is little evidence of the association between age at menarche and parity [
26]. Another study has also investigated reproductive factors in relation to sexual history, suggesting a younger age at menarche is not a risk factor for younger age at first having sexual intercourse (AFS) [
27]. Associations between reproductive factors could be reflective of causal relationships, or common genetic or non-genetic environmental causes, i.e. confounding.
Observational studies are prone to confounding bias as it is difficult to capture all confounders accurately. Mendelian randomisation (MR) is a method that assesses the causal relationship between an exposure and outcome by using genetic variants robustly associated with the exposure. MR is advantageous as it is less likely to be affected by confounding and reverse causation than standard multivariable regression analysis [
28‐
30]. There have been an increasing number of genome-wide association studies (GWAS) of reproductive factors [
31‐
33], which can be used to investigate genetic correlation (i.e. shared genetic causes) between these factors as well as whether relationships between reproductive factors may be causal using MR.
The present study aims to identify and clarify the nature of any relationships between women’s reproductive factors, by investigating their genetic overlap and the causal relationships between eight reproductive factors, including potential bidirectional effects where the temporal order between the traits is not clear.
Discussion
This study provides evidence supporting causal effects of several female reproductive factors on other reproductive traits. We show evidence that earlier reproductive factors including age at menarche, AFS and AFB have effects on subsequent events and factors, while ever parous status, age at menopause, number of births, ALB and lifetime number of sexual partners appear to have limited effects on other reproductive factors.
We substantiate the genetic correlation between reproductive factors shown in previous studies, while showing additional correlations that have not been previously investigated [
64,
65]. Our study supports evidence for a positive causal link between age at menarche and age at menopause [
11‐
16,
67,
68] and opposes previous studies that have shown the inverse association [
17,
18] or no association [
19‐
24]. Furthermore, our findings support one study that found little evidence for an association between age at menarche and parity [
26]. Additionally, we corroborated the findings of previous MR studies that identified a positive causal relationship between age at menarche and AFB, ALB and age at menopause, and between AFS and ALB [
67‐
69].
Many estimates identified in the primary analysis appear consistent across sensitivity analyses that aim to account for biases. However, some results did not persist in sensitivity analyses checking for robustness to sample overlap and winner’s curse.
The split-sample meta-analysed MR shows a weaker magnitude of effect compared to our primary analysis, which may be due to sample size reduction in this sensitivity analysis or bias introduced by sample overlap in the primary analysis.
Overall, using replication GWAS studies as the exposure or outcome showed weaker strength of evidence and/or magnitude of effects, although evidence for a causal effect for many relationships assessed was maintained. This may be due to bias introduced by winner’s curse in the primary analysis or smaller sample sizes available for the replication studies. In particular, age at menopause from the ReproGen consortium has a sample size of 69,360, compared to 143,791 in our primary analysis, and where this is used as the outcome, we found little evidence of an effect of reproductive factors on age at menopause. A more recent GWAS of age at menopause conducted by the ReproGen consortium has a much larger sample size (
n = 201,323) [
70,
71], although more than half of the sample comprise UK Biobank women, meaning a large sample overlap in the MR analysis. Nonetheless, MR estimates using this more recent GWAS revealed similar results compared to the previous smaller GWAS [
32]. While there are more recent, larger GWAS available for age at menarche [
72] and AFB [
73], UK Biobank has formed a large contribution to these GWAS. We decided to prioritise studies which had a smaller number of participants from UK Biobank for the replication GWAS, in order to reduce the likelihood of bias due to sample overlap.
The difference in how the phenotype for age at menopause between UK Biobank and ReproGen has been derived may contribute to differences in estimated effects. While both GWAS have excluded women who had a hysterectomy, ReproGen additionally excludes women who had a bilateral ovariectomy, those who had menopause induced by radiation or chemotherapy and those using hormone replacement therapy [
70].
MRlap revealed almost identical results compared to our primary analysis suggesting sample overlap may not substantially bias estimates.
Pleiotropy may occur when genetic variants have an effect on multiple phenotypes, which can be an issue in MR as the genetic instruments used as a proxy for the exposure can affect the outcome independently of the exposure of interest [
29,
60]. Therefore, resulting effect estimates may not correctly capture the exposure-outcome relationship of interest. This could be a problem as many of the reproductive factors are genetically correlated, and consequently, multiple sensitivity analyses were used to assess whether there was an exclusion restriction assumption violation. We implemented additional MR methods and numerous relationships did not appear to be affected by pleiotropy. Where outlier correction was possible, results were consistent with the primary analysis, with the exception of the effect of lifetime number of sexual partners on ever having children, where there was a complete attenuation of the effect after outlier correction.
However, it is worth considering that a recent study found that using MR-Egger on overlapping exposure and outcome samples may induce bias in the direction and magnitude of the confounding. This bias attenuates when the MR-Egger method is performing optimally, i.e. when it is employed with maximum variability in instrument strength. This is expressed as heterogeneity in gene-exposure estimates across SNPs, also referred to as
I2GX, which can be calculated using the
I2 statistic. It is estimated that the bias in MR-Egger when used in a one-sample setting is substantially reduced when
I2GX is higher than the recommended 90% [
43]. Conversely, other two-sample methods appear to perform similarly in a one-sample MR compared to a two-sample approach in similarly large sample size [
43]. Where there was evidence of non-null effects in the primary analysis, the
I2GX was >97% suggesting MR-Egger is performing optimally. Nonetheless, the MR-Egger test can be underpowered, especially when few instruments are available.
Mechanisms underlying causal links
We show that an earlier age at menarche may lead to an earlier AFS and AFB, as well as an earlier AFS leading to an earlier AFB. It is likely that earlier maturation may lead to earlier sexual activity, logically increasing the chance of an earlier pregnancy. In UK Biobank, a proportion of women may have had first had sexual intercourse prior to the introduction of the NHS family planning act 1967 which made contraception readily available through the NHS. This may have strengthened the effect of AFS on AFB in this cohort and findings may not be generalisable to more contemporary studies. We also show that an earlier AFS may lead to a higher number of sexual partners, which may occur due to a longer amount of time to acquire partners if sexual activity commences earlier. Furthermore, we identify that having a higher lifetime number of sexual partners may lead to a lower chance of having children. This may be due to the increased prevalence of short-term relationships and regularly changing sexual partners [
74], which, as a result, might lead to less chance of starting a family. However, it is worth noting that after excluding outlying variants, the effect between lifetime number of sexual partners and ever parous status attenuated. We present strong evidence for a positive relationship between AFB and ALB. One explanation for this link could be as parents tend to have children in a relatively short period of time, as shown in UK Biobank where the average AFB is 26 years, and ALB is 30 years for women.
The life history theory is another explanation as to why earlier age at menarche leads to earlier subsequent reproductive events and a likelihood of an increased number of children. This theory distinguishes the allocation of resources into growth and reproductive efforts and categorises “fast” or “slow” life history strategies [
75,
76]. A “fast” life history strategy exerts more effort towards reproduction: earlier puberty and sexual activity leading to an early AFB, and an increased number of births [
75,
76]. This is corroborated by our finding that women who experience an earlier AFS have children earlier and have more children. If a woman starts having children earlier, they have more opportunity to conceive again before menopause, which may explain the effect we identify between an earlier AFB and a higher number of children. A “fast” life history may lead to an earlier age at menopause as allocating resources towards reproductive efforts earlier in life and towards a higher number of children, which may result in a completing reproduction at a younger age.
There were a number of relationships where we did not find evidence for an effect in our primary analysis. Of note, we did not find a causal effect of age at menarche on the number of births and ever parous status. Considering the life history theory, we might have expected to find an inverse effect, suggesting an earlier age at menarche leads to a high number of births.
Furthermore, we did not find evidence of an effect of ever parous status on lifetime number of sexual partners and number of births on ALB. We investigated bidirectional effects between reproductive factors where there was not a clear temporal order and identified no bidirectional effects. Specifically, there were no effects between age at menopause and ALB, lifetime number of sexual partners, number of births and ever parous status, ALB and lifetime number of sexual partners and finally number of births and lifetime number of sexual partners.
Several relationships between reproductive factors separated by many years could be mediated by other intervening reproductive events. For example, we identify effects between age at menarche and AFS, AFS and AFB, and age at menarche and AFB; therefore, the effect we find between age at menarche and AFB may be mediated by AFS. Similarly, we found effects between AFS and AFB, AFB and ALB, and AFS and ALB, which could suggest that an earlier AFS leading to an earlier ALB may be mediated through an earlier AFB. In addition, there are likely to be mediating mechanisms for the relationships we have identified other than through reproductive factors such as body mass index [
63]. Future investigations could use mediation analyses to further elucidate these relationships [
77].
Implication of findings
When investigating one reproductive factor in relation to a health outcome, our findings might aid in identifying reproductive factors that could confound this relationship. For example, becoming a parent at an earlier age has been identified as a risk factor for depressive symptoms in young adulthood [
78,
79]. We have presented evidence that age at menarche has a causal effect on AFB, and previous studies have identified earlier age at menarche as a risk factor for poor mental health outcomes [
80,
81]. The evidence presented in this study suggests it would be important to adjust for age at menarche in an investigation of the effects of AFB on mental health outcomes.
Our work also suggests that reproductive factors might lie on the causal pathway between an earlier reproductive factor and a later outcome. We present evidence for a causal effect between AFB and number of births, and both reproductive factors have been identified as a risk factor for cardiovascular disease [
82]. An investigation of AFB on the risk of cardiovascular disease might want to consider mediation via the number of births.
Finally, a number of reproductive factors have been identified as risk factors for breast cancer, including age at menarche, age at menopause [
2], number of births and AFB [
3]. We have presented a number of causal inter-relationships between reproductive factors; therefore, researchers should carefully consider the total impact of reproductive factor variability on chronic diseases such as breast cancer rather than the impact of single reproductive indicators, and a multivariable approach could be particularly useful [
83].
Strengths and limitations
The strengths of the study include the range of reproductive factors investigated using the MR approach, the use of the large UK Biobank resource and data from other genetic consortia, and the extent of MR sensitivity analyses to evaluate MR assumptions and address sample overlap. However, this study has a number of limitations.
Firstly, negative control analysis revealed strong evidence of an effect of AFB on AFS, suggesting possible evidence of pleiotropy which has been previously identified for the AFS genetic instrument [
84]. As this may reduce the reliability of our results, future work could further assess whether the associations identified for AFS reflect true causal effects.
For some exposures such as ALB, the number of births and ever being parous, the number of SNPs used as genetic instruments was limited, meaning we cannot reliably evaluate pleiotropy and heterogeneity in these instances. Increasing the number of SNPs in the genetic instruments for each of these reproductive factors through larger GWAS would be valuable.
Another limitation is the issue of selection bias in UK Biobank. While 9 million individuals were invited to participate in the study, the response rate was 5%. Additionally, the participants in the UK Biobank and replication studies we used were largely restricted to women of European ancestry. These samples are therefore not representative of the entire UK female population and estimates may not be generalisable to women in other ancestry groups. In addition, these findings may not be representative of younger generations of women considering the average age of UK Biobank participants, and the evidence of secular trends in some reproductive factors. For example, there is evidence that there is a long-term downward trend in age at menarche [
85] and increase in AFB [
86]. Future work is required to replicate our findings in contemporary independent studies and translate the results in women in other ancestry groups.
While the majority of the reproductive factors are likely to be accurately captured through a questionnaire (such as AFB, number of births and ALB), other factors such as age at menarche may not be as reliably recalled [
87]. Self-report of lifetime number of sexual partners is also known to be overestimated by some, which could explain the positively skewed distribution we identified [
88]. To account for this, we performed a rank-based inverse normal transformation of this variable.
It is also worth noting that some reproductive events may not have been fully captured in the analysis, as certain reproductive milestones may not have been reached by some women. For example, younger women who were reported to have not had children may subsequently have children. In addition, the ALB and number of births may not reflect final reproductive milestones if some women go on to have more children. However, considering the mean age of UK Biobank women is 56.4 years (SD = 8.0), there are likely to be few women who go on to have more children.
The split-sample GWAS revealed little overlap between genome-wide significant SNPs identified in each sample. While some of these SNPs were identified slightly below the significance threshold between samples, others appeared not to be associated. This suggests that some SNPs may have been identified through spurious associations and may suggest evidence of winner’s curse.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.