Introduction

Elderly and frail patients scheduled to undergo major cancer surgeries are increasing. Impaired functional capacity from frailty, the natural effects of aging, sedentary lifestyle, and cancer biology are associated with increased postoperative complications, prolonged length of hospital stay, and increased risk of death after major cancer surgery [1]. Studies have also shown that neoadjuvant cancer therapy can cause a reduction in oxygen consumption (VO2) at Anaerobic Threshold (AT) and at peak exercise (pVO2) of up to 30% (2–3 ml kg− 1 min− 1) when measured using serial cardiopulmonary exercise testing (CPET) [2,3,4].

CPET, currently considered the gold standard in assessing functional capacity, is a dynamic, symptom-limited, non-invasive test that provides objective analysis of the functional capacity of the integrated cardiovascular, pulmonary, haematinic, and cellular metabolic systems [5,6,7]. CPET is increasingly used in the perioperative setting for preoperative risk stratification and, more recently, to guide preoperative optimisation strategies [8, 9]. The predominant CPET-derived parameters have been shown to be strong predictors of adverse outcomes following major surgery and include AT, pVO2, and minute equivalents for CO2 (ventilation/carbon dioxide production; Ve/VCO2). A recent systematic review demonstrated that an AT < 10.1 and < 10.9 ml kg− 1 min− 1 were strong predictors of increased morbidity and mortality, respectively, in patients undergoing intraabdominal surgery [10]. Similarly, pVO2 < 15 ml kg− 1 min− 1 is also an independent predictor of postoperative morbidity and mortality [11, 12].

The promising role of CPET as a perioperative risk assessment tool, however, is offset by it being resource-intensive, requiring skilled personnel and specialized equipment [13]. The Duke Activity Status Index (DASI: range between 0 and 58), a static questionnaire that has been developed almost 3 decades ago as a simple, quick, and cost-free surrogate measure of pVO2, is, therefore, appealing to identify at-risk patients for triage to a CPET laboratory. Based on simple yes or no answers to 12 questions related to the patient’s activities of daily living (Appendix 1 in supplementary material), pVO2 can be estimated using the following equation: pVO2 = DASI score × 0.43 + 9.6 [14, 15]. Although it was originally developed to monitor clinical progress of cardiovascular patients, the DASI has also been shown to have a modest correlation with pVO2 measured by CPET in patients undergoing general intraabdominal surgery [16]. Specifically, question 4—the ability to climb a flight of stairs—is frequently used in anaesthetic practice to estimate metabolic equivalents.

Nevertheless, the validity of using DASI to predict the true functional capacity of major cancer surgical patients has not been specifically examined. Cancer surgical patients often have preoperative neoadjuvant chemoradiotherapy that may induce deconditioning and confound the ability of the DASI to predict pVO2. We hypothesised that the DASI-predicted pVO2 may not be a good predictor of the functional capacity of major cancer surgical patients. In this study, (1) we compared the bias and limits of agreement between the measured and DASI-predicted pVO2 in a cohort of major cancer surgical patients; and (2) if proved unsatisfactory, whether recalibration of the predictive coefficient of each domain (or question) of the DASI questionnaire would improve its predictive ability.

Methods

Following Peter MacCallum Cancer Centre Human Research Ethics Committee approval (16/136R—November 2016), we undertook a retrospective cohort study of 43 consecutive patients who were scheduled for major cancer surgery (defined as intracavity tumour surgery lasting for more than 2 h), who, during their preoperative workup, were referred to our CPET service (January–June 2014), and had a concurrent DASI questionnaire administered. Referral of patients for CPET assessment was initiated by surgeons based on the standardised hospital-based CPET referral guidelines (Appendix 2 in supplementary material), with a predominant casemix of colorectal and upper gastrointestinal cancers. All patients who underwent CPET during this period were included. Patients who were unable to perform the test due to pain upon cycling and those who had neurological or severe cognitive deficits were excluded from CPET in the study centre. Cases with missing data were excluded from analysis. Data were collected in November 2016 through review of hospital electronic data and medical records. All cases were de-identified prior to statistical analysis.

CPET was performed as per the American Thoracic Society/American College of Chest Physicians (ATS/ACCP) practice guidelines [17]. A static respiratory function test was performed prior to exercise. Baseline data, collected with the patient at rest for 3 min, included: blood pressure, continuous 12-lead electrocardiogram (ECG), oxygen saturation, and breath-by-breath gas exchange through a tight-fitting mask (CardiO2/CP System, Medical Graphics Corporation, USA). During the exercise phase, patients cycled at 60–70 RPM on a cycle ergometer: 3 min unloaded cycling, then a ramp protocol of 20 W min−1 through increasing pedal resistance until peak exercise, and then 5 min of unloaded cycling during the recovery period. A clinician was present throughout. Ramping was stopped after achieving peak exercise at the patient or clinician’s discretion, either due to patient fatigue, dyspnoea, chest pain, leg pain, or signs of myocardial ischemia, hypotension, or arrhythmia. AT was determined using both the V-slope and ventilator equivalents [18] methods by two experienced anaesthetists trained in CPET. Two independent reviewers analysed and crosschecked the CPET data to ensure accuracy.

Data collected included patient demographics (age, gender, height, weight, etc.), history of prior administration of chemotherapy and/or radiotherapy within the last 6 months, objective measures of CPET-derived AT and pVO2 using gas-exchange analysis, DASI score, and the corresponding predicted pVO2. DASI was measured using a self-administered questionnaire [14] as part of routine preoperative assessment immediately before performance of CPET. The DASI was not available to the clinician interpreting the CPET results. Peak VO2, AT, age, height, and weight were evaluated as nominal values, whereas gender and prior chemoradiotherapy (yes or no) were described as categorical.

The primary outcome of this study evaluated the limits of agreement between DASI-predicted pVO2 and actual measured pVO2. Secondary outcomes investigated (a) whether the agreement between DASI-predicted pVO2 and actual measured pVO2 was affected by prior chemoradiotherapy; (b) the ability of the raw DASI score to predict (1) measured pVO2 and (2) anaerobic threshold; and (c) whether recalibrating the predictive coefficient associated with each domain (or question) of the DASI questionnaire would improve its predictive ability [19].

Statistical analyses

We assessed the bias and 95% limits of agreement between the predicted and measured pVO2 using a Bland–Altman plot, stratified by whether the patients had received chemoradiotherapy within 6 months of CPET. A scatter plot was used to assess how the measured pVO2 and AT were related to the DASI scores and the DASI-derived-predicted pVO2. Acceptable limits of agreement was considered to be a pVO2 value of 3 ml kg−1 min−1 [20]. Normality of the response variables (pVO2 and AT) was assessed by Q-Q plots and formally tested using the Shapiro–Wilk test—the most powerful statistical test for normality [21], before they were modelled using linear regression by the DASI score.

We then assessed whether recalibrating the intercept and predictive coefficient of the total DASI score or combining age, gender, and recent chemoradiotherapy with each individual DASI question’s score would improve the ability of DASI’s ability to predict the measured pVO2. In the latter analysis, a parsimony linear regression model was obtained by eliminating covariates in the model by optimising the adjusted R-square and the area under the receiver-operating characteristic (AUROC) curve of the model to discriminate between those patients with pVO2 ≥ 15 and < 15 ml kg− 1 min− 1. Finally, we also assessed the AUROC of the DASI score to discriminate between those patients with AT ≥ 11 ml kg− 1 min− 1 and those with AT < 11 ml kg− 1 min− 1 (a cut-point considered to be associated with increased perioperative surgical risk) [10, 11].

All statistical tests were performed using Predictive Analytics Software Statistics 22 (IBM Corporation, New York, 2013) and MedCalc (version 17.4 MedCalc Software bvba, Ostend, Belgium; 2017), and a p value < 0.05 was taken as statistically significant.

Results

A total of 45 cases were eligible for this study. Two cases were missing at least one data entry point in the DASI score and were excluded from further analyses. Most patients were elderly (median 63 years, interquartile range 18 years); male (58%) patients scheduled for intraabdominal cancer surgery (with the exception of two cases having thoracic surgery) (Table 1).

Table 1 Summary of patient characteristics and DASI-predicted and CPET measured functional capacity measures

The bias (or mean difference) between the predicted and measured pVO2 was large (on average the predicted overestimated measured pVO2 by 8 ml kg− 1 min− 1, 95% CI 6.2–9.8; p = 0.0001), and the 95% limits of agreement of the differences between the two measurements was wide (19.5 to − 3.4 ml kg− 1 min− 1). The predicted pVO2 tended to further overestimate the measured pVO2 as the measured pVO2 increased (Fig. 1). The bias and 95% limits of agreement between the predicted and measured pVO2 appeared to be similar between patients who had recent chemoradiotherapy (mean bias 8.5, 95% limits of agreement 20.5 to − 3.5 ml kg− 1 min− 1) and those who did not (mean bias 7.8, 95% limits of agreement 19.1 to − 3.6 ml kg− 1 min− 1; Figs. 2, 3, respectively).

Fig. 1
figure 1

Bland–Altman analysis of DASI-predicted and CPET measured oxygen consumption at peak exercise (pVO2). This figure shows that the bias (or mean difference) between the predicted and measured pVO2 was large (on average the predicted > measured pVO2 by 8 ml kg− 1 min− 1, 95% CI 6.2–9.8; p = 0.0001), and the 95% limits of agreement of the differences between the two measurements were wide (19.5 to − 3.4 ml kg− 1 min− 1)

Fig. 2
figure 2

Bland and Altman Plot between measured and predicted peak oxygen uptake for those who had recent chemoradiotherapy

Fig. 3
figure 3

Bland and Altman Plot between measured and predicted peak oxygen uptake for those without recent chemoradiotherapy

Normality of the response variables (pVO2 and AT) was confirmed by Q-Q plots and the Shapiro–Wilk test (p = 0.190 and p = 0.206, respectively). DASI-predicted pVO2 was only weakly linearly associated with the actual pVO2 (N = 43, adjusted R2 = 0.20; p = 0.002; Fig. 4), and the DASI scores were not statistically significantly related to the AT (p = 0.111; Fig. 5). The raw DASI scores had a modest ability to discriminate between those who had a pVO2 > 15 ml kg− 1 min− 1 and those who did not (AUROC = 0.77, 95% CI 0.61–0.88), but it did not accurately discriminate between patients with a high (> 11 ml kg− 1 min− 1) and low AT (≤ 11 ml kg− 1 min− 1) with AUROC = 0.53, 95% CI 0.33–0.73, p = 0.791; Figs. 6, 7, respectively.

Fig. 4
figure 4

Association between Duke Activity Status Index and maximal oxygen uptake, and the straight line indicates the linear regression line

Fig. 5
figure 5

Association between Duke Activity Status Index and anaerobic threshold, and the straight line indicates the linear regression line

Fig. 6
figure 6

Area under the receiver-operating characteristic curve for Duke Activity Status Index to predict high peak oxygen uptake > 15 ml kg− 1 min− 1 = 0.768, 95% CI 0.614–0.883

Fig. 7
figure 7

Area under the receiver-operating characteristic curve for Duke Activity Status Index to differentiate an anaerobic threshold < 11 against ≥ 11 ml kg− 1 min− 1 = 0.526 (95% CI 0.326–0.725, p = 0.791)

Recalibrating the intercept and regression coefficient of the total DASI score (our revised predicted pVO2 with DASI score alone = 0.115 × DASI score + 13.3; adjusted R2 = 0.20 and AUROC to predict > 15 ml kg− 1 min− 1 = 0.72, 95% CI 0.54–0.90) also did not improve its ability to predict the measured pVO2 in a substantial fashion. Further permutations, by including gender, but not age, or recent chemoradiotherapy [revised predicted VO2 with DASI and gender = 0.11 × DASI score + male × (1.93) + 12.4; adjusted R2 = 0.25 and AUROC to predict > 15 ml kg− 1 min− 1 = 0.74, 95% CI 0.58–0.91], only slightly improved the revised DASI model to predict the measured pVO2.

Using individual scores for each of the DASI questions improved the ability to predict the measured pVO2 statistically (with a revised raw data model derived to predict pVO2 = 10.294 + male × 2.045 + Q5 × 0.203 + Q6 × 1.417 + Q10 × 0.574; adjusted R2 = 0.37 and AUROC to predict > 15 ml kg− 1 min− 1 = 0.74, 95% CI 0.59–0.89; Fig. 8), but this, by no means, could be considered a reliable replacement of measuring pVO2 directly.

Fig. 8
figure 8

Relationships between different models predicted peak oxygen uptake and measured oxygen uptake

Discussion

Comparison with the existing literature

Our findings in the cancer surgery population demonstrated that DASI-predicted pVO2 had a substantial bias and wide 95% limits of agreement compared to the measured pVO2. This result was different from the previous reports on the general population (R2 = 0.64) and in the non-cancer specific patient population undergoing intraabdominal surgery (R2 = 0.45) [14, 16]. The intraabdominal cohort, investigated by Struthers et al., recorded only types of surgery and did not specify oncological status: vascular surgery comprised 58.0% of the cohort, colectomies and anterior resections comprised 20.0%, and nephrectomies accounted for 12.0%; it is unclear as to whether any of these patients underwent surgery for cancer-related reasons. Unlike the study by Struthers et al. (AUC 0.77, 95% CI 0.63–0.99, p = 0.003), the DASI score was unable to predict AT > 11 ml kg− 1 min− 1 (AUC 0.53, 95% CI 0.33–0.73, p = 0.791) in our cancer surgical cohort. Our study confirms that the DASI reported in the development cohort [14] at best only offers a modest correlation with CPET results and is unsuitable for use in the perioperative setting, particularly more so in oncological patients. Similarly, the DASI or our modified model of the DASI is still not sufficiently reliable as a triage tool to identify those with high likelihood of low functional capacity (AUROC < 0.55 for AT < 11 ml kg− 1 min− 1 and AUROC < 0.80 for pVO2 < 15 ml kg− 1 min− 1) who should be referred to a CPET laboratory for objective confirmation of their functional capacity.

Potential causes for a further reduction in capacity for prediction

Differences between our results and that of the previous studies may be due to differing methods of data collection for health quality outcomes, compounded by different settings and patient cohorts [22, 23]. The initial DASI study population (R2 = 0.64) was interviewer-administered which had a higher predictive ability, but this was not confirmed by a validation cohort (R2 = 0.31) when a redesigned self-administered questionnaire was used. Our results were more consistent with the latter, possibly due to wide interindividual variations in interpreting the meaning of the questions or recall bias. In addition, the medical and social context of the patients could also have an influence on the survey results. In large observational studies of healthy populations, participants have a tendency to overpredict their physical activity [24], particularly at higher levels of exercise [25]. Of note, our results also showed that DASI-predicted pVO2 did tend to overpredict the actual pVO2 values, suggesting that this questionnaire is less reliable when predicting functional capacity at higher pVO2 ranges. Similar subjective measures of functional capacity can be found in clinician-elicited stair-climbing ability, which has been incorporated into common practice to estimate perioperative risk [26, 27]. While history taking provides a quick bedside assessment of functional capacity, such subjective measures make them unreliable when administered by different parties. Furthermore, substantial discrepancy is reported between clinician-assessment and self-reported questionnaires in assessing functional capacity [15].

The tendency for the DASI, originally developed for use in the general population, to overpredict peak VO2 in major cancer surgical patients may also be related to the underlying pathophysiology of cancer and associated therapy, with patients likely to overestimate their functional capacity using the DASI questionnaire, because they are likely to recall what they could do prior to their cancer diagnosis. Over the course of 1–2 months, neoadjuvant chemotherapy rapidly decreases pVO2 by as much as 3 ml kg− 1 min− 1 [2, 4]. Such rapid changes may make it difficult for patients to accurately estimate their current level of physical activity compared to that prior to commencement of neoadjuvant chemoradiotherapy. Recent studies have shown that exercise prescription during neoadjuvant therapy offset this decline in functional capacity [28]. This recall bias and acute decline in functional capacity may, indeed, be an inherent limitation for any form of subjective assessment of functional capacity in major cancer surgical patients, especially the elderly that have become deconditioned due to lifestyle choice or illness. It is worth noting that, at lower pVO2, the difference between the predicted and measured pVO2 values was smaller, and it may be that patients with extremely poor functional capacity may be accurately predicted by DASI.

In the original study by Hlatky et al. [14], subjects in the development phase had high pVO2 in excess of 30 ml kg− 1 min− 1 with correspondingly high DASI scores, whereas this was less so in the second validation cohort. In the population studied by Struthers et al.[16], 78% of patients compared to our study’s 67% of patients had a pVO2 > 15 ml kg− 1 min− 1. Therefore, it is likely that the DASI is more suited at predicting a high pVO2 in the fitter, general population, compared to the oncological population, who are more limited in functional capacity.

This study has some limitations. First, this was a single-centre study: our results may not be applicable to patients with very different age and co-existing diseases to our patients. Second, due to the retrospective nature of this study, we did not have the opportunity to confirm whether our patients’ answers to the DASI questionnaires were accurate by cross-checking the answers to the questionnaires with their next of kin. This study may have also been subjected to selection bias; although perioperative CPET referral guidelines are in place, surgical teams are responsible for CPET referrals, and in that, may have their own selection bias. Finally, the small sample size of this study limited the statistical power, precision of the estimates, and did not allow us to assess smaller sub-groups such as separating chemotherapy and radiotherapy, or whether DASI scores were non-inferior in predicting perioperative morbidity and mortality compared to CPET measured pVO2.

Given the inverse relationship between functional capacity and postoperative complications [29, 30], developing a reliable tool to assess functional capacity (or to triage at-risk patients to resource-intensive CPET facilities for objective quantification of functional capacity) in surgical patients is essential to improve surgical outcomes, and especially to advance surgical oncology care. The DASI-predicted pVO2 had a weak relationship to objectively measure pVO2 with a large bias and wide range in limits of agreement in patients pending major cancer surgery. Both the total DASI scores and our permutations of the DASI-derived-predicted pVO2 also could not be considered sufficiently accurate to replace actual CPET measured anaerobic threshold and pVO2. A high specificity (90%) for a screening test (e.g., DASI) is needed to avoid missing too many patients with a low functional capacity before major cancer surgery. When the DASI’s specificity was 90%, the sensitivity would be 20% or less (Fig. 6). This would mean that using DASI as a screening test will, at best, reduce 20% of the CPET workload but still missing 10% of those with a low functional status (measured PVO2 < 15 ml kg− 1 min− 1) compared to a strategy of not using DASI at all and performing only CPET for every major surgical cancer patient. Although the DASI-predicted pVO2 and measured pVO2 were not interchangeable or with large observed limits of agreement in general, the difference between the two parameters did appear to vary dependent on the observed value of pVO2 especially at low values of pVO2. As such, further studies evaluating DASI for those with a genuinely low pVO2 due to limited cardiopulmonary exercise capacity are warranted.