Populations
San Francisco Bay Area breast cancer study (SFBCS): the SFBCS is a population-based multiethnic case–control study of breast cancer. Patients (cases) aged 35–79 years diagnosed with invasive breast cancer from 1995 to 2002 were identified through the Greater Bay Area Cancer Registry. Controls were identified by random-digit dialing and matched on 5-year age groups. Blood collection was initiated in 1999. For this study, we focused only on patients and matched controls who self-identified as Latina or Hispanic and included 351 cases and 579 controls. Samples from this study were used as part of the initial discovery set.
Breast Cancer Family Registry (BCFR): the BCFR is an international, National Cancer Institute (NCI)-funded family study that has recruited and followed over 13,000 breast cancer families and individuals with breast cancer with strong likelihood of genetic contribution to disease. The present study includes samples from the population-based Northern California site of the BCFR. Cases in patients aged 18–64 years diagnosed from 1995 to 2007 were ascertained through the Greater Bay Area Cancer Registry. Cases in patients with indicators of increased genetic susceptibility (diagnosis at the age of < 35 years, bilateral breast cancer with the first diagnosis at the age of < 50 years, a personal history of ovarian or childhood cancer, and a family history of breast or ovarian cancer in first-degree relatives) were oversampled. Cases not meeting these criteria were randomly sampled.
Population controls were identified through random-digit dialing and frequency-matched on 5-year age groups to cases diagnosed from 1995 to 1998. We included 641 cases and 61 controls who self-identified as Latina or Hispanic from this study. Samples from this study were used as part of the initial discovery set.
Since the SFBCS and BCFR were recruited from the same region and during an overlapping time frame, we combined these datasets to search for relatives. After removing relatives (preferentially keeping cases) and samples that overlapped with the Kaiser Research Project on Genes, Environment and Health, we included 942 cases and 589 controls from these studies.
Multiethnic cohort (MEC): the MEC is a large prospective cohort study in California (mainly Los Angeles County) and Hawaii. The breast cancer study is a nested case–control study including women with invasive breast cancer diagnosed at the age of > 45 years and controls matched on age (within 5 years) and self-identified ethnicity. After removing relatives (preferentially keeping cases), we used phenotypic and genetic data from 520 Latina breast cancer cases and 1544 matched Latina controls. Samples from this study were used as part of the initial discovery set.
Research project on genes environment and health (RPGEH): the RPGEH is a large cohort study of over 100,000 men and women of all racial/ethnic groups who are members of the Kaiser Permanente Health Plan. This analysis focuses only on women who are of self-reported Latina/Hispanic ethnicity (N = 3801). We included both incident and prevalent cases (total N = 225) in our analyses. We identified 44 women who were also included in the SFBCS. The genetic data from these participants were included as part of the RPGEH since we considered the Affymetrix Lat array as a more comprehensive array than the Affymetrix 6.0 array. After removing relatives, we included a total of 225 cases and 3574 controls. Samples from this study were used as part of the initial discovery set.
Cancer de mama (CAMA) study: this study is a population-based case–control study of breast cancer conducted in Mexico City, Monterrey, and Veracruz. Patients (cases) aged 35–69 years diagnosed between 2005 and 2007 were recruited from 11 hospitals (3–5 in each region). Controls were recruited based on membership in the same health plan as the cases and are frequency-matched on 5-year age groups. For the current study, we used phenotypic data and DNA samples from 1008 women with breast cancer and 1063 controls. Of these, 698 cases and 599 controls were genotyped with the Illumina Oncoarray and included in the discovery. An additional 310 cases and 464 controls were included as part of the replication dataset.
Colombian Study of Environmental and Heritable Causes of Breast Cancer (COLUMBUS): COLUMBUS is a population-based case–control study of breast cancer conducted in four cities: Bogota, Ibague and Neiva from the Central Colombian Andes region, and Pasto, from the Colombian South. Patients aged 18–75 years, with incident cases of invasive breast cancer, have been recruited in two population registries and two large cancer hospitals. Recruitment started in 2011. Cancer-free controls were recruited through the same institutions and were matched on education, socioeconomic status and local origin using a genealogical interview. In the current study, we used data from 954 cases and 769 controls for the replication study.
Hereditary Cancer Registry of City of Hope (HCRCOH) (Southern California; PI Jeffrey Weitzel): Latina breast cancer cases are part of the HCRCOH through the Clinical Cancer Genetics Community Research Network (CCGCRN). The CCGCRN includes cancer center and community-based clinics that provide genetic counseling to individuals with a personal or family history of cancer [
38]. All patients are invited to participate in the HCRCOH at the time of consultation (> 90% participation). Starting in May 1998 and continuing to the present, women of self-reported Latina origin with breast cancer were seen for genetic counseling, were enrolled in the Registry and underwent
BRCA1/2 testing after providing informed consent. In the current study we genotyped 1148 cases. The 347 unaffected female Latina controls were from Southern California and were invited to participate at community health fairs, via flyers, and at City of Hope. These samples were used as part of the replication study.
African American breast cancer GWAS (AABC): the GWAS includes African American participants from nine epidemiological studies of breast cancer, comprising a total of 3153 cases and 2831 controls (cases/controls: the MEC, 734/1003; the Los Angeles component of the Women’s contraceptive and reproductive experiences (CARE) study, 380/224; the Women’s circle of health study (WCHS), 272/240; the SFBCS, 172/231; the Northern California Breast Cancer Family Registry (NC-BCFR), 440/53;the Carolina breast cancer study (CBCS), 656/608; The Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) Cohort, 64/133; the Nashville breast health study (NBHS), 310/186; and the Wake Forest University breast cancer study (WFBC), 125/153). Additional details have previously been reported [
21,
39]. These samples were used as part of the replication study.
The ROOT consortium included six studies and a total of 1657 cases and 2029 controls of African ancestry: the Nigerian Breast Cancer Study (NBCS), 711/624; the Barbados national cancer study (BNCS), 92/229; the Racial variability in genotypic determinants of breast cancer risk study (RVGBC), 145/257; the Baltimore Breast cancer study (BBCS), 95/102; the Chicago cancer prone study (CCPS), 394/387; and the Southern community cohort (SCCS), 220/430. Additional details can be found elsewhere [
21]. These samples were used as part of the replication study.
Shanghai breast cancer genetics study: study participants were drawn from four population-based studies conducted in Shanghai, the Shanghai Breast Cancer Study (SBCS), Shanghai Women’s Health Study (SWHS), Shanghai Breast Cancer Survival Study (SBCSS), and the Shanghai Endometrial Cancer Study (SECS (which contributed control data only). The SBCS is a population-based, case-control study conducted in urban Shanghai. Subject recruitment in the initial phase of the SBCS (SBCS-I) was conducted between August 1996 and March 1998. The second phase (SBCS-II) of recruitment occurred between April 2002 and February 2005. Breast cancer cases were identified through the population-based Shanghai Cancer Registry and supplemented by a rapid case-ascertainment system. Controls were randomly selected using the Shanghai Resident Registry. The SBCSS included newly diagnosed breast cancer cases ascertained via the Shanghai Cancer Registry between April 2002 and December 2006. The SECS is a population-based, case–control study of endometrial cancer conducted between January 1997 and December 2003 using a protocol similar to the SBCS; only community controls from the SECS were included in the present study. The SWHS is a population-based prospective cohort study of women recruited between 1996 and 2000. The cohort has been followed by a combination of record linkage and active follow up to identify cause-specific mortality and cancer incidence by sites. All these studies are conducted among Chinese women in Shanghai, using very similar protocols in data and sample collection. There were 2731 cases and 2135 controls genotyped with an Affymetrix 6.0 array and 1794 cases and 2059 controls genotyped with an Illumina MEGA array. These subsets were analyzed separately and included in a meta-analysis as part of the replication study.
European ancestry GWAS data: we also evaluated the top SNPs using summary statistics from a recent large GWAS of European-ancestry breast cancer cases and controls [
40]. We downloaded the summary statistics the Breast Cancer Association Consortium (BCAC) website (
http://bcac.ccge.medschl.cam.ac.uk/bcacdata/oncoarray/) and used the summary statistics from the combined analysis of individuals of European ancestry from the Oncoarray and iCOGS consortia.