Background
Puberty is an important stage of development that impacts future breast cancer risk [
1]. Recent increases in the incidence of early-onset invasive breast cancer [
2] may be related to recent declines in the age at initiation of breast development [
3]. Earlier age at breast onset, independent of age at menarche, which unlike age at breast development has been more stable in recent decades, has been found to be associated with an increased risk for breast cancer [
4]. Large-scale epidemiologic studies investigating factors that may explain risk of early breast onset are needed across diverse populations. Therefore, it is important to use methods that can assess the timing of breast onset that are both sensitive and specific as well as feasible in large-population studies. Because timing of breast onset has been shown to vary by ethnicity and obesity [
5‐
7], it is also important to evaluate whether different methods to assess the onset of breast development vary by these factors.
Breast development can be measured by physical examination by clinicians or through guardian report or self-report based on questionnaires, either with or without picture prompts. Clinicians perform a visual and physical assessment, sometimes with palpation, of breast development according to Tanner stages (TSs) established in 1969 [
8]. Although clinician assessment via physical examination is considered the gold standard [
9,
10], self-reports and guardian reports are often used in lieu of the physical examination, especially in large-scale epidemiological studies in which conducting physical examinations is often not feasible. One such method is the Sexual Maturation Scale (SMS), a questionnaire-based tool that asks respondents to rate breast development based on pictures that correspond with the five TSs [
11]. Another commonly used questionnaire-based instrument is the Pubertal Development Scale (PDS) [
12]. A key advantage of the PDS is that it is question-based and does not involve pictures and can therefore be queried over the phone and/or more easily included in questionnaires.
Given the need to have scalable methods that accurately reflect pubertal development stage, the purpose of the present study was to compare the SMS and PDS with clinical TS to assess the specificity and sensitivity of reported measures of breast onset. Given that estrogens lead to increased epithelial proliferation in terminal end buds of the mammary gland, resulting in the onset of breast development [
13], a secondary aim of this study was to evaluate whether the assessment of hormonal measures in premenarcheal girls increases the validity of guardian-reported methods of breast onset. Estrone 1-glucuronide (E1G), an estrogen metabolite in urine, is an indicator of total circulating estrogens, which rise before puberty [
14]. Previous studies have found both the PDS and SMS to be accurate measures of breast development [
15‐
18], but none have simultaneously compared SMS, PDS, and hormonal biomarkers with clinical TS.
Discussion
Our results demonstrate that breast onset determined by mother’s report using the PDS corresponds well with both clinical TS (the existing gold standard) and the physiological changes in gonadal steroid hormone concentrations that drive pubertal maturation. Furthermore, the discrimination from mother’s report of breast onset using the PDS in predicting TS2+ was better than using mother’s report of SMS. Our findings also suggest that E1G in combination with mother’s report further improves the discrimination of breast onset. Although the majority of puberty studies use clinical TS to assess breast onset [
28‐
33], large-scale studies of pubertal development using mother’s report without a clinical visit can produce accurate and valid measures of breast onset, particularly when additional hormone measures are added.
Mother’s report using PDS had higher discrimination, agreement, and accuracy than mother’s report using SMS. We previously published the kappa values and percent agreement between mother’s report using SMS with clinical TS in the same cohort and found agreement with clinical TS to be 73% compared with the 88% reported here with PDS [
21]. Mothers were less accurate at identifying TS2+ when using SMS, perhaps because the pictures of TS2 capture size in addition to areolar development, leading mothers to downgrade their daughter’s breast development. However, the PDS questions preceded the SMS pictures, and this order of administration may have affected the comparison as well. The kappa value and percent accuracy of mother’s reports of breast onset using either the PDS or SMS compared with clinical TS were consistent and higher in our study than those previously reported in other studies of pubertal development [
15‐
17]. Specifically, in a study consisting of 78 girls (aged 9–14 years), the kappa value between mother’s report using either PDS or SMS compared with clinical TS was 0.36 with 47–49% agreement [
15]. The age range of girls in our study was wider (6–16 years) than in other studies, allowing us to assess differences in accuracy according to age. We observed that mothers of young daughters (< 10 years old) were accurate in reporting the absence of breast onset (specificity) and mothers of older daughters (≥ 10 years old) were accurate in reporting the presence of breast onset (sensitivity); thus, the wider age range and more mothers of young (6–9 years old) and older (14–16 years old) daughters in our cohort may explain the higher overall agreement in our study. Although we were unable to compare PDS and SMS with clinical TS in our full cohort, the sample size of the subset with all available measures was larger than that in any previous studies which have assessed mother’s reports of breast development [
15‐
18]. Our present results and other recent results [
15] differ from those of a 2002 review of pubertal assessment methods in which the authors concluded that the PDS was the least valid method compared with SMS and other methods. The studies that have shown the utility of PDS [
15,
16] were not included in the review, and none of the studies reviewed included hormone measures [
7].
We found that including a urinary estrogen metabolite in addition to mother’s report further improved the discrimination of breast onset (TS2+). Investigators planning to assess breast onset may want to consider incorporating estrogen biomarkers in their pubertal assessment because including E1G in our study improved the AUC by up to 0.11. However, estrogen alone does not fully capture clinically assessed breast onset, because there is substantial overlap in hormone-level distributions between stages of breast development, as we and others have shown [
15,
34,
35]. Prior studies proposed that relatively weak correlations between clinician’s and mother’s reports of breast development and hormone levels may be due to accounting inadequately for menstrual cycle phase of biospecimen collection [
15,
34]. However, menstrual cycle day is not a source of variation in our study, because all estrogen measures were taken from premenarcheal girls. Rather, some of the wide variation in E1G in TS2 across all assessment methods may be explained by the inclusion of girls with transient thelarche as well as girls with permanent thelarche in this group (
see Fig.
1). Girls with transient thelarche (i.e., the appearance of breast onset that regresses and appears again) have lower hormone profiles than girls with permanent thelarche [
36]. Whether to include estrogen measurement in a study design of pubertal development depends on the overall intent of the study. For example, for studies where there is interest in identifying breast onset as a period of breast cancer susceptibility, transient thelarche may be sufficiently captured by PDS assessment method (and estrogen measurement is not necessary), because the appearance of breast tissue marks a period of cell proliferation and rapid breast tissue development.
Because we have shown that estimates of age at breast onset, typically the first sign of pubertal development, differ depending on the assessment method, the degree of misclassification by each method has implications when using pubertal onset as either an outcome or a parameter to define the pubertal window of susceptibility. A focus of pediatric research since the 1990s has been determining whether there is a secular decline in the age at breast onset [
9]. Although it appears that there has been a decline in the age at breast onset [
5,
6,
37], one of the main critiques of early studies of pubertal timing in the United States was that even the gold standard, clinical breast Tanner staging, was limited if palpation was not performed [
38]. There was concern that physical examination without palpation could not accurately distinguish true TS2 from lipomastia caused by obesity. In our study, two of the LEGACY sites used clinically trained providers to measure TS with palpation when necessary to rule out misclassification due to lipomastia. Measuring clinical TS may also still be extremely useful in young girls because the specificity of PDS is still very low in mothers of young girls. Until the assessment of breast onset is standardized, one way to draw comparisons across future studies is to assess breast onset using all three methods in a subset of the study population, particularly in young girls, so that final estimates can be adjusted for measurement error [
20]. For studies that cannot implement all three methods, the measurement error estimates from our study can be used, as long as the limitations of our study are considered. Although we did not observe major differences based on breast cancer family history, our enriched study based on half of the participants having a breast cancer family history may not be generalizable to other populations. Our study also does not address whether mother’s report of breast onset would be a reliable measure in other countries; however, mothers living in selected cultures may prefer the PDS because it does use pictures of breasts.
Ultimately, assessing breast onset accurately in easily scalable ways is essential to advancing the understanding of how early life influences breast cancer risk, as well as understanding pubertal trends and their health impacts more broadly. Early breast onset (< 10 years old compared with 11–12 years) is associated with a 23% increased risk of breast cancer [
4]. We found a 2- to 8-month difference in the age at onset, depending on whether mother’s report of breast onset was assesed by PDS or SMS. Considering that a 1-month delay in age at breast onset is related to a 1.6% decrease in breast cancer risk [
4], it is important to consider the expected effect size of the association in relation to the size of measurement error. For example, in a recent longitudinal study of breast onset assessed by annual clinical TS, obesity (BMI > 95th percentile) was associated with an 8.4-month acceleration in median age of breast onset compared with nonobese U.S. girls (50th to < 85th percentile) [
5]. The median age of breast onset was also 6.7 months earlier in this population of girls born between 1996 and 2002 compared with girls born between 1980 and 1990 [
5,
6]. Both of these studies assessed breast onset using an annual clinical assessment of breast onset for the majority of their participants. However, in the study by Biro et al., only a subset of girls was assessed semiannually, and the authors explained that semiannual vs annual assessment could account for a 3- to 4-month difference in the age of breast onset between the studies [
5]. A clear advantage of using PDS over clinical TS to assess breast onset are that (1) it can be implemented more frequently and in a more cost-effective and scalable manner, and (2) it may yield tighter estimates of median age of breast onset, especially for exposures of interest that may have associations of smaller magnitude than body size or secular time.