Main

Accurate prognostication is essential when selecting the most appropriate adjuvant therapy following surgery for early breast cancer. A number of predictive models are now available to help estimate the survival for individual patients, including the Nottingham Prognostic Index (NPI), Adjuvant and Predict, and to date, these have all been based on known pathological prognostic factors including tumour size, tumour grade and lymph node status. The NPI, first described in1982 (Haybittle et al, 1982), has been prospectively validated (Todd et al, 1987; D'Eredita et al, 2001) and more recently updated (Blamey et al, 2007a) to provide accurate survival estimates following breast cancer surgery, including individual survival estimates (Blamey et al, 2007b). The introduction of Adjuvant!, a web-based (www.adjuvantonline.com) prognostication tool, in 2001 (Ravdin et al, 2001) went a step further by also providing absolute treatment benefits for hormone therapy and chemotherapy by applying risk reductions from the Early Breast Cancer Trialists Collaborative Group (1998a, 1998b) to breast cancer-specific mortality estimates based on the Surveillance Epidemiology and End Results Program (www.seer.cancer.gov). This model has been validated in case cohorts from British Columbia (Olivotto et al, 2005), the Netherlands (Mook et al, 2009) and the United Kingdom (Campbell et al, 2009).

Prognostication is becoming more sophisticated, and additional prognostic and predictive factors need to be considered in any current prognostic and treatment benefit model. There is a growing body of evidence to show that screen detection confers an additional survival benefit beyond stage shift and also reduces the risk of systemic recurrence when compared with symptomatic cancers of a similar stage (Joensuu et al, 2004; Shen et al, 2005). Recent studies have shown that the majority of the survival advantage associated with breast screening can be explained by this shift to an earlier stage at diagnosis, and more favourable prognostic factors (Dawson et al, 2009), but approximately 25% of the survival advantage is still unexplained (Wishart et al, 2008). These findings were recently confirmed in a data set from the Netherlands in which screen detection was associated with a 26 and 38% reduction in all-cause and breast cancer-specific mortality, respectively (Mook et al, 2011). The authors from this paper concluded that method of detection should be taken into account when estimating individual prognosis.

Predict is an online prognostication and treatment benefit tool developed in the United Kingdom, which is based on 5694 women diagnosed in East Anglia from 1999 to 2003 (Wishart et al, 2010). The model includes mode of detection and provides 5- and 10-year survival estimates as well as treatment benefit predictions at both time points, and it has been validated in an independent case cohort from the United Kingdom and more recently in a British Columbia data set wherein it was also compared with Adjuvant! (Wishart et al, 2011). This comparison showed that Predict and Adjuvant! provide accurate overall and breast cancer-specific survival (BCSS) estimates that were comparable. However, the Predict estimates did not utilise the mode of detection component of the model as the mode of detection was not available for the British Columbia cohort. Moreover, Adjuvant! does not take account of the mode of detection. Predict was slightly better calibrated for breast cancer-specific mortality with predicted deaths being within 3.4% of observed deaths compared with 6.7% for Adjuvant!. Both models showed good discrimination with similar area under the receiver-operator characteristic curve (AUC; 0.723 vs 0.727 for Predict and Adjuvant, respectively). Predict is now available online at www.predict.nhs.uk.

The aim of this study was to incorporate the prognostic effect of HER2 status into Predict (hereafter referred to as Predict+), and to compare the 10-year survival estimates from Predict+ with Predict, Adjuvant! and the observed 10-year outcome from the British Columbia data set.

Methods

Prognostic effect of tumour HER2 status

Estimates for the prognostic effect of HER2 status were based on an analysis of data from the Breast Cancer Association Consortium (BCAC). Pathology data from 12 of these studies has been previously published (Blows et al, 2010) and the analysis was updated to include data from another 3 studies excluded from the previous publication because of missing data on basal markers that is not relevant for this analysis (Table 1). However, as the validation cohort for this study overlaps substantially with the British Columbia Cancer Agency (BCCA) case series published in Blows et al (2010), this case series was excluded from the BCAC data re-analysis. In addition, all patients diagnosed since 2004 were excluded, to exclude patients likely to have been treated with trastuzumab. In total, HER2 data were available for 10 179 cases (7278 ER-positive and 2931 ER-negative, Table 2) for whom data were also available for age at diagnosis, tumour size (2, 2–4.9 and 5+ cm), tumour grade and nodal status. We estimated the hazard ratio for HER2-positive disease compared with HER2-negative disease using a Cox proportional hazards model stratified by study and adjusted for size, grade and nodal status. Separate regression models were used for ER-positive and ER-negative cases. As we have shown previously (Blows et al, 2010), the hazard ratio for HER2-positive disease decreases over time in women with ER-positive breast cancer, so the log hazard ratio was modelled to vary linearly with time. The effect of HER2 in women with ER-negative breast cancer is not time-dependent.

Table 1 Description of participating studies
Table 2 Number of cases by study, ER status and HER2 status

The hazard ratios estimated from the BCAC data set were then used to modify Predict, which is also based on a Cox proportional hazards models. However, Predict was developed using a case cohort of breast cancer cases of unknown HER2 status, and so the underlying baseline hazard is representative of cases of average HER2 status. The HER2 hazard ratio estimates based on the BCAC data are for HER2-positive cases compared with HER2-negative cases, and so these were rescaled to give an average hazard ratio of unity using an estimated prevalence of HER2 of 9% in ER-positive cases and 25% in ER-negative cases. The applied hazard ratios for ER-positive cases are shown in Table 3. A fixed hazard ratio of 0.93 for HER2–negative cases and 1.27 for HER2–positive cases compared with HER2 unknown cases was applied to the baseline hazard of the ER-negative model.

Table 3 Hazard ratio by HER2 status by time since diagnosis in ER-positive breast cancer

Validation study population

We used the same case cohort from British Columbia, Canada, which we used to validate the original version of Predict. The data set has previously been described (Olivotto et al, 2005), but, in brief, includes data from 1653 patients with information on HER2 status out of a total of 3140 patients with stage I or II invasive breast cancer diagnosed in British Columbia, Canada, from 1989 to 1993, who were identified from the Breast Cancer Outcomes Unit (BCOU) of the BCCA. The BCOU prospectively records demographical information, pathological information, staging, initial treatment and outcome information including first loco-regional and distant relapse, as well as date and cause of death. Outcome data were reported annually by the treating oncologist, family physician or by monthly death certificate flagging through the British Columbia Cancer Registry and Department of Vital Statistics for British Columbia.

Information obtained from the BCOU database included age at diagnosis, sex, menopausal status, year of diagnosis, histology (ductal, lobular, other), histological grade, tumour size, number of lymph nodes sampled and number of lymph nodes positive, lymphovascular invasion status, ER status, HER2 status, type of local therapy (wide local excision, mastectomy, radiotherapy) and type of adjuvant systemic therapy (none, chemotherapy, endocrine therapy, both). The HER2 status was evaluated using TMAs as previously described (Chia et al, 2008). Chemotherapy regimens were categorised as four cycles of doxorubicin plus cyclophosphamide; 6 months of cyclophosphamide, methotrexate and fluorouracil, or other chemotherapy during this time period. None of these patients received trastuzumab.

Study endpoints were overall survival (OS) and BCSS. Of 1653 cases used in this analysis, cause of death was unknown in 5, and so 1648 cases were used for the analysis of BCSS. Ten-year-predicted OS and BCSS were calculated for each patient using Predict+, Predict and Adjuvant! (standard version 8) by investigators blinded to the actual outcome data for each patient after entry of patient age, tumour size, number of positive nodes, tumour grade, ER status, HER2 status (Predict+ only) and adjuvant systemic therapy. The default comorbidity setting of ‘minor symptoms’ and the chemotherapy option chosen was ‘anthracycline-containing; 4 cycles’ were used in Adjuvant!. The ‘second-generation’ chemotherapy option (anthracycline-containing regimens) was used in Predict+ and Predict. The mode of detection input was not used in Predict or Predict+, as this information was not available in the British Columbia data set. Ten-year-predicted OS and BCSS from Predict+, Predict and Adjuvant were compared with observed 10-year OS and BCSS.

Model calibration is a comparison of the predicted mortality estimates from each model with the observed mortality. In addition to comparing calibration in the complete data set, we evaluated calibration within strata of other prognostic variables. We also evaluated calibration within quartiles of predicted mortality. A goodness-of-fit test was carried out by using a χ2-test based on the observed and predicted number of events (4 d.f.). Model discrimination was evaluated by calculating the AUC calculated for 10-year breast cancer-specific and overall mortality. This is a measure of how well the models identify those patients with worse survival. The AUC is the probability that the predicted mortality from a randomly selected patient who died will be higher than the predicted mortality from a randomly selected survivor.

Results

The calibration of Predict for both all-cause mortality and for breast cancer-specific mortality was improved by the incorporation of HER2 status. Adjuvant! performed slightly better than Predict and Predict+ for all-cause mortality, but both Predict and Predict+ outperformed Adjuvant! for breast cancer-specific mortality. All three models slightly underestimated the observed number of deaths. The total number of deaths predicted by Adjuvant was within 6.1% of that observed (492 vs 524, P=0.16) compared with 8.8% for Predict (478 vs 524, P=0.04) and 8.4% for Predict+ (480 vs 524, P=0.05). The total number of breast cancer-specific deaths predicted by Adjuvant was within 14% of that observed (311 vs 360, P=0.01) compared with 3.6% for Predict (347 vs 360, P=0.49) and 2.5% for Predict+ (351 vs 360, P=0.60). Table 4 shows the observed and predicted all-cause 10-year OS by various clinico-pathological sub-groups. All models performed well in most subgroups. Notable exceptions were the performance in women aged 20–35 years, in which all three models underpredicted the actual number of deaths by 32%, and in HER2-positive cases, in which Adjuvant and Predict underestimated the number of deaths by close to 20% compared with just 9% for Predict+. None of these differences were statistically significant (P>0.05). We also compared the predicted mortality with that observed for women in each quartile of predicted risk. Calibration was good for Adjuvant! across all risk categories (goodness-of-fit, P=0.51), and reasonable for Predict+ (goodness-of-fit, P=0.042) and Predict (goodness-of-fit, P=0.032) (Figures 1A–C). Table 5 shows the observed 10-year breast cancer-specific mortality compared with that predicted by Predict+, Predict and Adjuvant. Again, each model performed well across most subgroups, except in women aged 20–35 years wherein they all underpredicted the actual number of deaths by 32% (P>0.05). In HER2-positive patients, the total number of breast cancer-specific deaths predicted by Predict+ was within 5% of observed (71 vs 75, P=0.64) compared with 20% for Predict (n=60 vs 75, P=0.08) and 29% for Adjuvant (53 vs 75, P=0.01). Calibration across the quartiles of predicted breast cancer risk was good for Predict+ (goodness-of-fit, P=0.11) and Predict (goodness-of-fit, P=0.068), and reasonable for Adjuvant! (goodness-of-fit, P=0.001) (Figures 1D–F). Model discrimination was similar for all three models with AUCs for breast cancer-specific mortality of 0.665, 0.661 and 0.649 for Predict+, Predict and Adjuvant!, respectively.

Table 4 Observed and predicted 10-year all-cause mortality by demographical, pathological and treatment characteristics
Figure 1
figure 1

Calibration plots of observed outcomes with 95% confidence intervals against predicted outcomes by quartiles of the predicted value. Overall survival predicted by (A) Predict+ (B) Predict and (C) Adjuvant!, and breast cancer-specific survival predicted by (D) Predict+ (E) Predict and (F) Adjuvant!.

Table 5 Observed and predicted 10-year breast cancer-specific mortality by demographical, pathological and treatment characteristics

Discussion

Predict was developed using a flexible, Cox proportional hazards model that enables the easy incorporation of additional prognostic factors using external estimates of the prognostic effect of such a factor. We have used this flexibility to incorporate the prognostic effect of HER2 derived from a large multi-centre study. A key feature of Predict, and one that differentiates it from Adjuvant!, is that it uses different underlying models for ER-positive and ER-negative disease. This is particularly important as the effect of other prognostic variables is often different according to ER status. This difference is particularly marked for HER2 status, where the effect of HER2 is strongly time-dependent in ER-positive cases but not for ER-negative cases. Our results confirm that the inclusion of HER2 status in the clinical prognostication tool Predict improves both model discrimination and calibration. The improvement in model fit was most pronounced in the HER2-positive subset of patients and was particularly marked for breast cancer-specific mortality. Given that it is breast cancer-specific mortality that is reduced by adjuvant therapy (Early Breast Cancer Trialists Collaborative Group, 1998a, 1998b), it is likely that improved model performance will lead to more accurate predictions of the absolute benefit of treatment. Adjuvant! performed better than either Predict or Predict+ for overall mortality. This might be expected as the background mortality data on which Adjuvant! is based are from North America, whereas Predict is based on background mortality data from the United Kingdom.

The weaknesses of our study also need to be considered in interpreting our results. There were some missing data for the BCAC cohorts used to derive the HER2-specific hazard ratios and for the BCOU cohort. Data were missing for multiple reasons, but the major reason was because of unavailability of archival pathology material. Although it is possible that there is some selection bias in the missing data – for example, pathology material is more likely to be unavailable for small tumours – any bias is unlikely to be large. This notion is supported by the results of the validation in the independent data from the BCOU cohort. As we have described previously, none of the models performed very well in the youngest age group. The reasons for this are unclear, but probably reflect the fact that the data on which the models were based included relatively small numbers of patients in this age group.

The online version of Predict includes survival estimates and treatment benefit at 5 and 10 years post diagnosis. Four clinical trials have now reported on the benefit of trastuzumab therapy – FinHER (Joensuu et al, 2009), HERA (Smith et al, 2007), B31/N9831 (Romond et al, 2005; Perez et al, 2010) and BCIRG006 (Slamon et al, 2011). The relative reduction in the all-cause mortality has ranged from 0.33 to 0.45. Two studies reported results stratified by tumour hormone-receptor status neither of which found evidence for heterogeneity (Romond et al, 2005; Smith et al, 2007; Perez et al, 2010). The new version of Predict+, including trastuzumab benefit at 5 years is available at www.predict.nhs.uk.

One of the key advantages of the Predict models is that they include mode of detection as one of the input parameters. As previously discussed, screen detection appears to confer an additional survival advantage over and above known prognostic factors. Unfortunately, as screening data was not available in the British Columbia data set, it was not possible to use this feature when running the Predict models, so the default setting was ‘unknown’ for all patients. Thus, we cannot estimate the performance of the Predict models if the mode of detection were known. However, it is very likely that model performance would improve with the addition of mode-of-detection data. In addition, a recent publication has called for the mode of detection to be taken into account when deciding optimal adjuvant therapy for an individual patient (Mook et al, 2011). Predict+ can now provide such a model with the added benefit of the inclusion of HER2 status as well as the absolute treatment benefit of trastuzumab at 5 years. Further validations of Predict+ in data sets where the mode of detection, HER2 status and trastuzumab treatment are recorded are planned in the near future.

This study reports the first successful inclusion of HER2 status in a prognostic model (Predict+) based on known clinical and pathological factors. The study has demonstrated a marked improvement in 10-year BCSS estimates using Predict+ compared with the original Predict model for HER2-positive patients. Both Predict models provide better BCSS estimates than Adjuvant! in HER2-positive patients in this data set. This improvement in BCSS prognostication for HER2-positive patients was not achieved at the expense of the HER2-negative cohort, wherein both Predict and Predict+ performed better than Adjuvant. Predict+ also provided better breast cancer-specific mortality estimates that Predict and Adjuvant. This is extremely important for optimal model function, as it is the breast cancer-specific mortality that is reduced by the relative risk reductions of adjuvant therapy, and this improvement in model performance by Predict+ should lead to more accurate absolute treatment benefit predictions for individual patients.