Introduction

Genome-wide association studies have identified an increasing number of single nucleotide polymorphisms (SNPs) associated with breast cancer risk, the majority of which have been discovered by the COGS consortium using Caucasian women of European descent [1]. Whilst each SNP is only associated with small increment in risk, the indications are that a polygenic approach to genetic testing could improve estimates of individual risk, raising the possibility of individualized screening strategies for women [2]. Several studies have investigated the value of combining the genomic risk estimates obtained from SNP genotyping with conventional breast cancer risk prediction algorithms such as the breast cancer risk assessment tool (BCRAT, also known as the Gail Model) and IBIS (also known as the Tyrer-Cuzick Model). The combination of SNP panels with these algorithms has been shown to improve risk prediction, reclassifying some women across risk categories and potentially changing clinical management [37].

Although originally developed using population data for white women, the BCRAT has been modified for Hispanic women using SEER data [8] and for African American women as the modified CARE Model [9, 10]. Whilst the IBIS Model has only been validated for European populations, it is widely used across ethnicities in breast cancer centers throughout the USA [11]. This new study investigates whether a panel of SNPs can improve breast cancer risk estimates obtained from BCRAT or IBIS for African American and Hispanic women, in terms of calibration and discriminatory accuracy. These women comprise increasing proportions of the US population and represent a large proportion of the world’s population.

Methods

Subjects

We studied 7539 self-reported African American women and 3363 self-reported Hispanic women identified from within the Women’s Health Initiative (WHI) SNP Health Association Resource (SHARe). Written informed consent was obtained from each participant and the study was approved by the Fred Hutchinson Cancer Research Center Institutional Review Board. Participants in the WHI had an opportunity to opt in or out of any collaborations involving commercial entities because some women may prefer not to participate in research involving commercial (as opposed to non-profit) entities. We restricted our analyses to the subset of these individuals that had consented for collaborations involving commercial entities. The interventions used in the WHI clinical trial are independent of baseline genetic and clinical risk factors by study design [12], so analyses presented here were not stratified by trial intervention.

Selection of SNPs

The SNP panels used were derived from SNPs identified as being associated with breast cancer risk from studies of Caucasian women [13] and for which imputed genotypes were available in WHI SHARe. This resulted in a panel of 75 SNPs for African Americans and 71 for Hispanics.

Risk prediction models

We used the BCRAT (incorporating the modified CARE model) [9, 10] and IBIS [14] to estimate the 5-year absolute risk of breast cancer. For BCRAT, we did not have information on biopsy histopathology (i.e., presence of atypical hyperplasia) so this was coded as “unknown”. Similarly for IBIS, missing family history variables were coded as “unknown”.

SNP risk score and combined model risk scores

Using the approach of Mealiffe et al. [3], we calculated a SNP risk score using previously published estimates of the odds ratio (OR) per allele and risk allele frequencies (p) [13, 1518] assuming independence of additive risks on the log OR scale. For each SNP, we calculated the unscaled population average risk as μ = (1 − p)2 + 2p(1 − p)OR + p 2OR2. Adjusted risk values (with a population average risk equal to 1) were calculated as 1/μ, OR/μ, and OR2/μ for the three genotypes defined by the number of risk alleles. The overall SNP risk score was then calculated by multiplying the adjusted risk values for each of the SNPs [5].

For both BCRAT and IBIS, we calculated a combined risk score by multiplying the SNP-based score by the model’s predicted 5-year risk of breast cancer.

Statistical analysis

The model risk scores, SNP-based score and combined risk scores were log transformed for all analyses, and then adjusted for age using multiple linear regression. We used Pearson correlation to test for associations between the model risk scores, the SNP-based score and the combined risk scores. We then used logistic regression to estimate risk associations, in terms of OR per age-adjusted log 5-year predicted risk, while adjusting for age group. Model calibration was assessed using the Hosmer–Lemeshow goodness-of-fit test, which compares the expected and observed numbers of cases and controls within groups that were defined by deciles of risk for controls. Discrimination between cases and controls was measured using the AUCs of the risk scores.

As in Mealiffe et al. [3], we categorized 5-year absolute risks as low risk (<1.5 %), intermediate risk (≥1.5 and <2.0 %) and high risk (≥2.0 %) and constructed reclassification tables for each of the risk prediction models as a cross-tabulation of the classification of the risk score from the original model with the risk score from the combined model. The net reclassification improvement statistic was calculated as P(up|case) − P(down|case) + P(down|control) − P(up|control), where up refers to moving to a higher risk category and down refers to moving to a lower risk category. We tested the null hypothesis that the net reclassification improvement is equal to 0 using an asymptotic Z-test.

Stata Release 13 [19] was used for all statistical analyses; all statistical tests were two sided, and P values less than 0.05 were considered nominally statistically significant.

Results

African American women

The characteristics of the study participants are provided in supplementary Table 1. For cases, the mean 5-year risk of breast cancer was 1.7 % (SD 0.06 %) from BCRAT and 1.3 % (SD 0.04 %) from IBIS. For controls, the mean 5-year risk of breast cancer was 1.6 % (SD 0.05 %) from BCRAT and 1.3 % (SD 0.04 %) from IBIS. The mean SNP-based score was 1.29 (SD 0.51) for cases and 1.19 (SD 0.43) for controls. Supplementary Table 2 shows the genotype distributions and the minor allele frequencies for cases and controls for each of the 75 SNPs as well as their OR per allele and the corresponding published ORs.

Table 1 Age-adjusted association between log-transformed risk scores and breast cancer represented as the OR per SD of the age-adjusted log-transformed risk score for African Americans

Table 1 shows the age group-adjusted association between the age-adjusted log-transformed risk scores and breast cancer. For each of the models, the OR per SD of the age-adjusted risk scores was higher for the combined score than for both the SNP-based score and the corresponding model risk score. The increase in OR by the addition of SNPs was 9.6 % for BCRAT and 17.5 % for IBIS.

Receiver operating characteristic curve analysis confirmed that, for each model, the combined risk score gave greater discrimination than the SNP-based score and the corresponding model risk score (Table 2). The increase in AUC compared with 0.5 by the addition of SNPs was 5.4 % for BCRAT and 7.8 % for IBIS.

Table 2 AUC for the age-adjusted log-transformed risk scores—AS

For each of the models, the risk scores and the combined risk scores were classified as low risk (1.5 %), intermediate risk (≥1.5 and <2.0 %), and high risk (≥2.0 %), as shown in Tables 3 and 4. The proportion of cases moving into a higher risk category was 42.5 % for BCRAT and 37.7 % for IBIS, while the proportion of cases moving into a lower risk category was 10.1 % for BCRAT, and 6.5 % for IBIS. The proportion of controls moving into a lower risk category was 11.2 % for BCRAT, and 8.2 % for IBIS, while the proportion of controls moving into a higher risk category was 40.3 % for BCRAT and 33.5 % for IBIS. The net reclassification improvement was 0.033 for BCRAT (95 % CI −0.025, 0.089), and 0.060 for IBIS (95 % CI 0.005, 0.113).

Table 3 Reclassification table for SNP × Gail risk versus Gail risk in African American women
Table 4 Reclassification table for SNP × IBIS risk versus IBIS risk in African American women

Hispanic women

The characteristics of the study participants are provided in Supplementary Table 3. For cases, the mean 5-year risk of breast cancer was 1.2 % (SD 0.07 %) from BCRAT and 1.4 % (SD 0.04 %) from IBIS. For controls, the mean 5-year risk of breast cancer was 1.1 % (SD 0.06 %) from BCRAT and 1.4 % (SD 0.04 %) from IBIS. The mean SNP-based score was 1.19 (SD 0.65) for cases and 1.00 (SD 0.57) for controls. Supplementary Table 4 shows the genotype distributions and the minor allele frequencies for cases and controls for each of the 71 SNPs as well as their OR per allele and the corresponding published ORs.

Table 5 shows the age group-adjusted association between the age-adjusted log-transformed risk scores and breast cancer. For each of the models, the OR per SD of the age-adjusted risk scores was higher for the combined risk score than that for the SNP-based score and the corresponding model risk score. The increase in OR by the addition of SNPs was 19.0 % for BCRAT and 26.1 % for IBIS.

Table 5 Age-adjusted association between log-transformed risk scores and breast cancer represented as the OR per SD of the age-adjusted log-transformed risk score for Hispanics

Receiver operating characteristic curve analysis confirmed that, for each model, the combined risk score gave greater discrimination than the SNP-based score and the corresponding model risk score (Table 6). The increase in AUC compared with 0.5 by the addition of SNPs was 10.9 % for BCRAT and 11.3 % for IBIS.

Table 6 AUC for the age-adjusted log-transformed risk scores

For each of the models, the risk scores and the combined risk scores were classified as low risk (1.5 %), intermediate risk (≥1.5 and <2.0 %), and high risk (≥2.0 %), as shown in Tables 7 and 8. The proportion of cases moving into a higher risk category was 20.4 % for BCRAT and 35.4 % for IBIS, while the proportion of cases moving into a lower risk category was 6.8 % for BCRAT, and 10.8 % for IBIS. The proportion of controls moving into a lower risk category was 6.2 % for BCRAT, and 16.8 % for IBIS, while the proportion of controls moving into a higher risk category was 11.7 % for BCRAT and 23.1 % for IBIS. The net reclassification improvement was 0.082 for BCRAT (95 % CI 0.003, 0.162), and 0.181 for IBIS (95 % CI 0.085, 0.273).

Table 7 Reclassification table for SNP × Gail risk versus Gail risk in Hispanic women
Table 8 Reclassification table for SNP × IBIS risk versus IBIS risk in Hispanic women

Discussion

The ability of a 77-SNP panel to improve the risk estimates provided by the major breast cancer risk assessment algorithms for Caucasians (BOADICEA, BRCAPRO, BCRAT) has been previously quantified [20] and the combined SNP and model risk scores are now among the strongest known measures for differentiating women with and without breast cancer, at least for Caucasian women [20, 21]. For example, the OR per SD of age-adjusted risk scores for the model that included the SNP score versus the same model alone was from 1.67 to 1.80 for BCRAT, and from 1.30 to 1.52 for IBIS.

The present study quantifies how much the addition of a SNP risk component can also improve the discrimination of the BCRAT and IBIS models for both African American women and Hispanic women. Specifically, for African American women the OR per SD increased from 1.25 to 1.37 when using BCRAT and from 1.04 to 1.22 when using IBIS. For Hispanic women, the corresponding changes were from 1.25 to 1.48 and from 1.15 to 1.42.

For each of the risk prediction models, the combined risk score resulted in approximately 40 % of African American cases moving into a higher risk category and approximately 10 % of controls moving into a lower risk category. For Hispanics, over 20 % of cases moved into a higher risk category, and 6 % of controls moved into a lower risk category when using BCRAT and 17 % when using IBIS. These values are higher than the two previous studies of Caucasian cohorts which identified between 3 and 10 % of cases moving to a higher risk category [6, 20].

The AUC value for BCRAT obtained for Hispanic women is lower than that previously reported [8]. Whilst the IBIS model is widely used across ethnicities in the US it has only been validated for Caucasian populations [22], and both the ORs and AUC derived here for the model alone are low for both African American and Hispanic women. For the present analysis, information was not available for second-degree relatives or for family history of ovarian cancer and it is not immediately clear whether this has impacted on the IBIS model performance. In addition to ethnicity differences and reduced pedigree inputs, the low values may reflect that IBIS was developed using data from studies of predominately postmenopausal women and is intended for use with high-risk populations [11].

Similarly, the SNPs used in this study were predominantly identified by discovery GWAS of Caucasian women [13]. The estimated OR per SD for the log SNP-based score alone was 1.24 for African American and 1.39 for Hispanic women, which are both lower than the estimate of 1.55 reported by Mavaddat et al. for Caucasian women [13] Whilst susceptibility loci are likely to be similar across ethnicities, the informative SNPs for those loci could vary across, and remain to be confirmed, across ethnicities. Thus, the SNP risk scores used here are likely to improve once GWAS datasets use Phase I datasets of the relevant ethnic populations, and fine mapping studies have been conducted across populations.

Overall, breast cancer prevention strategies rely upon accurate risk assessment, the models for which have typically only been validated for Caucasian women. Although most national screening programs rely solely upon age as the factor to determine eligibility (e.g., inviting only women above a certain age-threshold for screening), more targeted screening based upon a calibrated risk assessment is being considered [23]. We hope that the information presented in studies such as this can eventually be used to help make screening more effective, and across all populations of the world, particularly those with less resources.