FormalPara Take-home message

In this Bayesian analysis of the COVID STEROID 2 trial, we found relatively high posterior probabilities of benefit with dexamethasone 12 mg versus 6 mg in patients with COVID-19 and severe hypoxaemia on all outcomes, including the days alive without life support and mortality at day 28 and 90. We found relatively low probabilities of clinically important harm with 12 mg dexamethasone for all outcomes.

Introduction

Coronavirus disease 2019 (COVID-19) may lead to critical illness and severe hypoxaemia [1]. As of September 2021, the ongoing COVID-19 pandemic has caused > 4.5 million deaths worldwide [2]. Systemic corticosteroids decrease mortality in critically ill patients with COVID-19 [3], and dexamethasone 6 mg daily for up to 10 days is therefore recommended by the World Health Organization for patients with severe or critical COVID-19 [4]. Higher doses of systemic corticosteroids have been used in patients with COVID-19 and non-COVID-19 acute respiratory distress syndrome [3, 5, 6], and higher doses have been hypothesised to benefit patients with severe or critical COVID-19, although the balance between benefit and harm remains uncertain.

We conducted the COVID STEROID 2 trial to compare a higher (12 mg) versus the recommended dose (6 mg) of dexamethasone daily for up to 10 days in patients with COVID-19 and severe hypoxaemia [5, 7]. In the primary frequentist statistical analysis, the adjusted mean difference for days alive without life support up to 28 days after randomisation (primary outcome) was 1.3 days (95% confidence interval [CI]: 0–2.6, p = 0.07) higher with 12 mg, and the adjusted relative risks (RR) for 28-day mortality and serious adverse reactions were 0.86 (99% CI: 0.68–1.08, p = 0.1) and 0.83 (99% CI: 0.54–1.29, p = 0.27) with 12 mg, respectively [7]. While the pre-defined thresholds for statistical significance were not reached in these analyses or in the analyses of outcomes registered at day 90, the results were more compatible with benefit with 12 mg, and consequently, a nuanced interpretation avoiding arbitrary dichotomisations is warranted [8, 9].

Here, we report the secondary, pre-planned Bayesian analyses of all outcomes registered up to day 90 in the COVID STEROID 2 trial to facilitate a more nuanced and probabilistic interpretation of the trial results [10].

Methods

We report the pre-planned secondary Bayesian analysis of the COVID STEROID 2 trial of all outcomes registered within 90 days. This study was conducted according to a published protocol and statistical analysis plan [10], with this manuscript prepared according to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist (see Electronic Supplementary Material, ESM) and adhering to the Reporting of Bayes Used in clinical STudies (ROBUST) guideline [11, 12].

The COVID STEROID 2 trial

The COVID STEROID 2 trial was an investigator-initiated, international, centrally randomised, stratified (for site, use of invasive mechanical ventilation, and age below 70 years), parallel-group, blinded clinical trial. Adult patients hospitalised with COVID-19 and severe hypoxaemia (≥ 10 L oxygen supplementation/minute independent of delivery system, use of non-invasive ventilation or continuous positive airway pressure for hypoxaemia, or invasive mechanical ventilation) were included; exclusion criteria were primarily related to use of systemic corticosteroids for other indications than COVID-19 in doses > 6 mg dexamethasone equivalents or previous use of systemic corticosteroids for COVID-19 for ≥ 5 days, invasive fungal infection or active tuberculosis, pregnancy or unobtainable consent (detailed in the ESM and elsewhere [5, 7, 10]).

Patients were randomised to dexamethasone 12 mg or 6 mg intravenously daily for maximum 10 days (depending on the number of consecutive days with corticosteroid treatment before randomisation; ESM). Randomisation took place between August 27, 2020 and May 20, 2021 at 31 sites in 26 hospitals in Denmark, India, Sweden, and Switzerland [7].

The trial was approved by the Ethics Committee of the Capital Region of Denmark with additional national/local approvals as required; additional details on approvals and consent procedures and the primary trial results are available elsewhere [5, 7, 10].

Outcomes

Primary outcome

Days alive without life support at day 28 (days alive without invasive mechanical ventilation, circulatory support, and kidney replacement therapy).

Secondary outcomes

  1. 1.

    One or more serious adverse reactions (new episodes of septic shock, invasive fungal infection, clinically important gastrointestinal bleeding, or anaphylactic reaction to intravenous dexamethasone) within 28 days of randomisation.

  2. 2.

    All-cause mortality at day 28.

  3. 3.

    All-cause mortality at day 90.

  4. 4.

    Days alive without life support at day 90.

  5. 5.

    Days alive and out of hospital at day 90.

Detailed definitions are provided in the ESM and elsewhere [5, 7, 10].

Statistical analyses

Analyses were conducted using R version 4.1.0 with the Tidyverse packages [13] and Stan (CmdStan version 2.26.1) [14] through the brms R package [15], with all analyses adjusted for the stratification variables (with sites with few patients in each country merged for these analyses). Technical details (including sampler settings and model diagnostics) are presented in the ESM and elsewhere [10].

Bayesian analyses and priors

Bayesian analyses start with prior probability distributions, representing prior beliefs, which are updated once data has been collected to posterior probability distributions that allow straightforward interpretation and calculation of direct probabilities [10, 16]. We estimated adjusted outcome data in each group from the joint posterior distributions and used these estimates to derive adjusted relative and absolute treatment effects. We planned to present conditional adjusted estimates using a reference patient in each group with all adjustment variables set to their most common value [10], but due to substantial differences between outcomes across trial sites, results would be difficult to interpret using this approach. Consequently, we primarily present average estimates and average treatment effects, secondarily supplemented with estimates calculated for three different representative reference patients (ESM). Posteriors were summarised by calculating direct probabilities of a number of different pre-defined effect sizes, and summarised with point estimates (posterior medians) and percentile-based 95% credible intervals (CrIs) that represent the 95% most probable effects given the prior, the model, and the data [10, 16]. In addition, we graphically present full and cumulated posterior distributions (displaying the probabilities of all possible effect sizes) for the treatment effects.

Weakly informative priors including all plausible effect sizes and centred on no difference were used for all parameters in the primary Bayesian analyses; these priors primarily served to stabilise computations with minimal influence on the results. Sensitivity analyses were conducted using sceptic priors for the intervention effects; these priors are sceptic of large differences and shrink effect estimates towards no difference, as many interventions assessed in critically ill patients have shown small or uncertain effects [17]. We did not conduct sensitivity analyses using evidence-based priors due to the lack of available external data. Exact priors are specified in the ESM and elsewhere [10]; briefly, the primary priors corresponded to probability distributions for the adjusted incidence rate ratios (IRRs)/odds ratios for the intervention effect centred on 1.00 with 95% central probability mass between 0.14 and 7.10 in each (sub)model [10].

Analysis of the primary outcome

We expected substantial zero-inflation of the primary outcome, which we analysed using an adjusted hurdle-negative binomial model [10]. This two-part model consists of a logistic regression estimating the probability of exactly zero days and a zero-truncated negative binomial model [18] estimating the mean number of days in patients with > 0 days. This model has conceptual similarities with the Kryger Jensen and Lange test used in the primary frequentist analyses of the trial [7, 19]. We estimated the adjusted mean number of days alive without life support in each group, the adjusted mean difference (MD) and adjusted IRR, and pre-defined a clinically important difference as an absolute MD ≥ 1 day [10].

Analysis of the secondary outcomes

The binary secondary outcomes were analysed using adjusted logistic regression models, with results presented as adjusted probabilities in each group, adjusted relative risks (RRs) and adjusted risk differences (RDs). We pre-defined an absolute RD of ≥ 2 percentage points as clinically important for all binary outcomes [10]. The secondary count outcomes were analysed similarly to the primary outcome, with an absolute MD ≥ 1 day pre-defined as clinically important [10].

Missing data handling

We planned to use multiple imputation if ≥ 5% had missing data for variables included in an analysis [10]. As missingness was lower than this threshold for all analyses, we conducted complete case analyses, with additional best/worst-worst/best-case sensitivity analyses of the primary outcome (assuming all patients in the 12 mg group were alive without life support and that all patients in the 6 mg group were not alive without life support on all days without available data, and vice versa) [7].

Post hoc analyses

Our count outcome definitions used the actual number of days; however, non-survivors have commonly been assigned zero days in other trials [20,21,22], and thus we conducted additional post hoc analyses assigning non-survivors zero days to ease comparison. Moreover, we conducted best/worst-worst/best-case sensitivity analyses of the primary outcome to match the primary report; these were not detailed in the Bayesian analysis protocol [7, 10]. Last, while we expected substantial zero-inflation for all count outcomes, we expected limited inflation at the maximum values and expected the planned model to adequately fit the data [10]. Unexpectedly, 41.4% and 40.6% of patients had the maximum values of days alive without life support at day 28 and 90, respectively [7]. To account for the unexpected distributions, we conducted additional post hoc sensitivity analyses of both these outcomes using adjusted Bayesian linear models with weakly informative priors and a Bayesian bootstrap procedure using adjusted, conventional linear regressions, and derived all effect measures from the predicted, adjusted means in each group (ESM). This was not an issue for days alive and out of hospital at day 90 (only 1 patient had the maximum possible value of 89 days).

Results

In total, 982 patients were included in the intention-to-treat population, with 497 assigned to 12 mg dexamethasone and 485 assigned to 6 mg dexamethasone. Detailed data on baseline characteristics, protocol adherence, treatment durations, and individual components of the outcomes have been presented elsewhere [7]; data for the adjustment variables and descriptive data for the outcomes analysed here (including missingness) are presented in Tables 1 and S1 (ESM).

Table 1 Data on baseline variables, stratification variables, and outcomes

Primary outcome

For days alive without life support at day 28, the adjusted MD was 1.3 days (95% CrI −0.3 to 2.9), corresponding to an adjusted IRR of 1.08 (95% CrI 0.98–1.18), and probabilities of any benefit and clinically important benefit with 12 mg dexamethasone of 94.2 and 63.9%, respectively (Table 2). Full posterior distributions and probabilities of all possible effect sizes are presented in Fig. 1. In the sensitivity analysis using a sceptic prior (Table 2), the probabilities of any benefit and clinically important benefit with 12 m dexamethasone were 94.9 and 61.8%, respectively. Similar results were found in the post hoc sensitivity analyses (Table S2, ESM), with probabilities of benefit with 12 mg dexamethasone of > 90% in all analyses for the average treatment effects. The direction of conditional effects was similar with largely consistent relative effects across all reference patients and somewhat larger variation in absolute effects; probabilities of benefit with 12 mg dexamethasone varied more due to increased uncertainty for one reference patient (Table S3, ESM). The probabilities of clinically important harm were 0.3 and 0.2% in the primary analyses and the analysis using a sceptic prior, respectively, and ≤ 4.5% in all analyses for the average effect and for all reference patients (Table 2, Tables S2–S3, ESM).

Table 2 Average treatment effect estimates and probabilities of effects
Fig. 1
figure 1

Days alive without life support at day 28. Full posterior probability distributions for the effect of the treatment on the primary outcome (days alive without life support at day 28; primary analysis using weakly informative priors). Left plot displays the relative difference (incidence rate ratio, IRR), while the right plot displays the absolute difference (mean difference, MD) in days. These results are adjusted for all stratification variables and calculated as average treatment effects, as outlined in the methods section. An IRR > 1 or MD > 0 favours 12 mg dexamethasone; an IRR < 1 or MD < 0 favours 6 mg dexamethasone. The upper subplots display the cumulative posterior distributions, corresponding to the probabilities of effect sizes (X-axis) ≤ the values on the left Y-axis and > the values on the right Y-axis. The lower subplots display the entire posterior distributions, with the bold, vertical line indicating the median value (used as the point estimate) and the area highlighted in red indicating the percentile-based 95% credible interval. The vertical black lines represents exactly no difference, and the area highlighted in blue in the absolute effects plots represent effect sizes smaller than the pre-defined minimally clinically important difference of 1 day in either direction [10].

Secondary outcomes

Results for the effects of 12 mg versus 6 mg dexamethasone on all secondary outcomes are presented in Table 2, Figs. 23, Table S4 and Figs. S1–S3 (ESM). For serious adverse reactions at day 28, the adjusted RR was 0.85 (95% CrI 0.63–1.16), with probabilities of any benefit, clinically important harm, and no clinically important difference of 84.1, 2.1, and 49.5%, respectively. For mortality at day 28, the adjusted RR was 0.87 (95% CrI 0.73–1.03), with probabilities of any benefit, clinically important benefit, and clinically important harm of 94.8, 80.7, and 0.9%, respectively. For mortality at day 90, the adjusted RR was 0.88 (95% CrI 0.75–1.02), with similar probabilities of different effect sizes. For days alive without life support at day 90, the uncertainty was somewhat larger with an adjusted MD of 3.6 days (95% CrI −3.1 to 10.2) and probabilities of any benefit and clinically important harm of 85.0 and 9.2%, respectively. For days alive and out of hospital at day 90, the adjusted MD was 3.9 days (95% CrI −0.6 to 8.4), with probabilities of any benefit, clinically important benefit, and clinically important harm of 95.7, 89.7, and 1.5%, respectively.

Fig. 2
figure 2

Serious adverse reactions at day 28. Full posterior probability distributions for the effect of the treatment on serious adverse reactions at day 28 (primary analysis using weakly informative priors). Left plot displays the relative difference (relative risk, RR), while the right plot displays the absolute difference (risk difference, RD) in percentage points. These results are adjusted for all stratification variables and calculated as average treatment effects, as outlined in the methods section. An RR < 1 or RD < 0 favours 12 mg dexamethasone; an RR > 1 or RD > 0 favours 6 mg dexamethasone. The upper subplots display the cumulative posterior distributions, corresponding the probabilities of effect sizes (X-axis) ≤ the corresponding values on the left Y-axis and > the corresponding values on the right Y-axis. The lower subplots display the entire posterior distributions, with the bold, vertical line indicating the median value (used as the point estimate) and the area highlighted in red indicating the percentile-based 95% credible interval. The vertical black lines represents exactly no difference, and the area highlighted in blue in the absolute effects plots represent effect sizes smaller than the pre-defined minimally clinically important difference of 2 percentage points in either direction [10].

Fig. 3
figure 3

Mortality at day 28. Full posterior probability distributions for the effect of the treatment on 28-day all-cause mortality (primary analysis using weakly informative priors). Left plot displays the relative difference (relative risk, RR), while the right plot displays the absolute difference (risk difference, RD) in percentage points. These results are adjusted or all stratification variables and calculated as average treatment effects, as outlined in the methods section. An RR < 1 or RD < 0 favours 12 mg dexamethasone; an RR > 1 or RD > 0 favours 6 mg dexamethasone. The upper subplots display the cumulative posterior distributions, corresponding to the probabilities of effect sizes (X-axis) ≤ the values on the left Y-axis and > the values on the right Y-axis. The lower subplots display the entire posterior distributions, with the bold, vertical line indicating the median value (used as the point estimate) and the area highlighted in red indicating the percentile-based 95% credible interval. The vertical black lines represents exactly no difference, and the area highlighted in blue in the absolute effects plots represent effect sizes smaller than the pre-defined minimally clinically important difference of 2 percentage points in either direction [10].

The relative effect estimates were largely consistent with sceptic priors, in the post hoc analyses of days alive without life support at day 90, and for conditional effects in different reference patients, with variations in absolute effects and larger uncertainty for some outcome/reference patient combinations. Probabilities of clinically important harm were very or relatively low for all analyses of all secondary outcomes on average and for the different reference patients, with somewhat larger uncertainty for the day 90 count outcomes.

Discussion

In this pre-planned secondary Bayesian analysis of the COVID STEROID 2 trial, we found high posterior probabilities of benefit with a higher daily dose (12 mg) of dexamethasone than the currently recommended dose (6 mg) on days alive without life support at day 28 (94.2%) and all secondary outcomes (94–96% for mortality at day 28 and day 90 and days alive and out of hospital at day 90; 84–85% for serious adverse reactions at day 28 and days alive without life support at day 90). Similarly, we found relatively low probabilities of clinically important harm for all outcomes, with probabilities ≤ 2.1% for the average effects on serious adverse reactions at day 28 (a composite of septic shock, clinically important gastrointestinal bleeding, anaphylaxis, and invasive fungal infections), mortality at day 28 and 90, and days alive and out of hospital at day 90. Results were largely consistent across different planned and post hoc sensitivity analyses conducted and largely similar for different reference patients, although uncertainty was somewhat larger in some reference patients and absolute effects varied due to different control group counts and event rates.

While the primary frequentist analyses of the COVID STEROID 2 trial did not reach the pre-defined thresholds for statistical significance [7], absence of statistical significance is not evidence of absence of an effect [23]. The results from the primary analyses were mostly compatible with benefits with 12 mg dexamethasone, and this is further supported by these pre-planned Bayesian analyses. Arguments have been made to entirely abandon the concept of statistical significance [9], and to embrace uncertainty and interpret results in a more nuanced manner, regardless of the analytical approach used [24]. Similarly, it has been argued that while strong evidence should be provided before implementing new, costly, and potentially burdensome interventions, practice changes for well-known treatments already in widespread use may require less firm evidence [8]. This is arguably the case for two different doses of a well-known and inexpensive drug like dexamethasone when the evidence for all outcomes seems to favour one dose with relatively high probabilities. Importantly, longer-term outcomes from the COVID STEROID 2 trial will be reported after 180 days follow-up [5]. Furthermore, several other randomised clinical trials are currently comparing different doses of dexamethasone; results from these trials and the COVID STEROID 2 trial will be pooled in a prospective meta-analysis [25], and thus, the overall evidence base is expected to improve in the coming months. Until then, considering 12 mg instead of 6 mg dexamethasone daily in patients with COVID-19 and severe hypoxaemia seems reasonable given the current evidence, although potential interaction with other treatments (e.g., interleukin-6 receptor antagonists) remains unresolved.

Strengths and limitations

This study has several strengths. First, the general strengths of the COVID STEROID 2 trial, including the sample size, blinding, international recruitment, and high inclusion rate, also apply to this secondary analysis [7]. Second, this study was pre-planned, and the statistical analysis plan was published prior to enrolment of the last patient and before any data were analysed [10]. Third, results were consistent across the different sensitivity analyses conducted.

This study comes with limitations, too. First, the distributions of days alive without life support after 28 and 90 days were different than expected, and thus the planned model did not fit the data as well as expected. However, this did not appear to influence the conclusions, as similar results were found in the post hoc analyses using different approaches. Second, we deviated from the pre-specified analysis plan by primarily presenting average treatment effects instead of conditional treatment effects. This was considered necessary due the low event rates (primarily for serious adverse reactions, with overall few events/centre) and possible differences in distributions across sites, which made the results difficult to interpret for the planned reference patient approach and posed a risk of underestimating serious adverse reactions in either group for the full trial. Of note, the underlying statistical models were unchanged, and conditional effects for the planned reference patients and additional other reference patients were also presented. Overall, the results were largely consistent, especially for the relative effect measures. The absolute effect measures (and thus the interpretations of clinical importance in different reference patients) may have varied, as is always the case when relative effects are consistent and baseline risk varies [26]. Third, general limitations of the COVID STEROID 2 trial, including changes in the standard of care during the trial (e.g., possibly increased use of interleukin-6-receptor antagonists [27], which may affect the effects of different doses of systemic corticosteroids) and no data on baseline inflammatory status also applies to this secondary analysis [7]. Finally, although pre-defined [10], our definitions of clinically important effect sizes may be challenged. For all count outcomes, we considered an absolute difference of at least 1 day to be clinically important, as shorter periods will have small implications for patients and for health care capacities. For mortality and serious adverse reactions, we pragmatically considered absolute risk differences of at least 2 percentage points as important, as even relatively small absolute differences may have important implications due to the large number of patients affected by the pandemic. However, we recognize that other reasonable thresholds could have been chosen instead.

Conclusion

In conclusion, we found high probabilities of benefit and relatively low probabilities of clinically important harm with dexamethasone 12 versus 6 mg daily for up to 10 days in patients with COVID-19 and severe hypoxaemia on all outcomes assessed within 90 days of randomisation.