FormalPara Key Summary Points

Why carry out this study?

The interleukin (IL)-6 receptor inhibitor, sarilumab, and the Janus kinase (JAK)-1 inhibitor, upadacitinib, are approved for treatment of patients with moderately to severely active rheumatoid arthritis (RA), but head-to-head comparisons in clinical trials have not been performed to date.

In the absence of head-to-head comparisons, and where there are only one or two trials for a treatment, approaches for the indirect comparisons of different treatments include the matching-adjusted indirect comparison (MAIC) and the simulated treatment comparison (STC).

What did the study ask? What was the hypothesis of the study?

We used the MAIC and STC analyses to estimate the relative efficacy of sarilumab and upadacitinib in patients with RA who had an inadequate response to previous biologic disease-modifying antirheumatic drugs (bDMARDs) using data from the TARGET (sarilumab) and SELECT-BEYOND (upadacitinib) trials.

What was learned from the study? What were the study outcomes/conclusions?

Our results, obtained using the two population-adjusted indirect comparisons (MAIC and STC), suggest a similar effect of sarilumab and upadacitinib when given in combination with conventional synthetic DMARDs.

What has been learned from the study?

To the best of our knowledge, this analysis was the first indirect comparison of sarilumab and upadacitinib, and one of the first studies in RA to utilize the STC methodology.

Indirect comparisons have become increasingly common in assessing comparative efficacy; their use should be encouraged but critically evaluated.

Introduction

Targeted disease-modifying antirheumatic drugs (DMARDs), biologic or synthetic, are commonly prescribed for patients with rheumatoid arthritis (RA) who have a suboptimal response to conventional synthetic DMARDs alone [1, 2]. As examples of the available options in RA, the interleukin (IL)-6 receptor inhibitor, sarilumab, administered as a subcutaneous injection, and the Janus kinase (JAK)-1 inhibitor upadacitinib, administered orally, are both approved for treatment of patients with moderately to severely active RA [3,4,5], but head-to-head comparisons in clinical trials have not been performed to date, particularly in patients who have failed biologic DMARDs (bDMARDs) previously.

In the absence of head-to-head trials, network meta-analyses can be used to evaluate effects of various treatments indirectly. However, in networks consisting of only one or two trials per treatment, indirect comparisons are highly vulnerable to systematic variation (bias) resulting from imbalances in effect modifier distributions [6, 7]. Alternative approaches for the indirect comparison of different treatments include the matching-adjusted indirect comparison (MAIC) and the simulated treatment comparison (STC). In the MAIC, a propensity score weight is used to adjust for differences between the individual patient-level data in the index trial and the comparator study by increasing the weight of individuals with more similar patient-related factors [8]. In the STC, patient-level data from one trial are used to model the outcome(s) of interest as a function of relevant covariates. A regression model is then used to predict the outcomes for the index population that would have been observed in populations from comparator studies for which only aggregate study-level data are available [9].

Here, we report the MAIC and STC analyses of sarilumab and upadacitinib (approved doses) outcomes using data from the two phase 3 trials conducted to date in patients with RA who were refractory to previous treatment with bDMARDs.

Methods

Trials and Patients

The TARGET (NCT01709578) [10] trial of sarilumab and the SELECT-BEYOND (NCT02706847) [11] trial of upadacitinib were randomized, placebo-controlled studies that enrolled adults with active RA who had received conventional synthetic DMARDs and had prior bDMARD exposure. TARGET randomized 546 patients with tender joint count (68 joints assessed; TJC68) of ≥ 8, swollen joint count (66 assessed; SJC66) of ≥ 6, C-reactive protein (CRP) concentration ≥ 8 mg/l, disease duration ≥ 6 months, and either inadequate response or intolerance to tumor necrosis factor inhibitors. The follow-up periods for its two co-primary endpoints were 12 weeks (change from baseline in the Health Assessment Questionnaire—Disability Index [HAQ-DI]) and 24 weeks (proportion of patients who achieved American College of Rheumatology criteria for 20% improvement [ACR20]) [10]. SELECT-BEYOND randomized 499 adults with similar inclusion criteria with minor differences: TJC68 and SJC66 ≥ 6, CRP ≥ 3 mg/l, and inadequate response or intolerance to ≥ 1 bDMARD. The follow-up for both co-primary endpoints – the proportion of patients achieving ACR20 and the proportion of patients who achieved a 28-joint disease activity score using CRP (DAS28-CRP) of ≤ 3.2—was 12 weeks [11].

The MAIC and STC analyses comprised patient-level data of participants using the recommended dose of sarilumab (200 mg q2w) from TARGET (index trial) and aggregate patient data using upadacitinib (15 mg/day) from SELECT-BEYOND (comparator trial), or placebo.

Outcomes

Endpoints evaluated included the proportions of patients achieving ACR20, ACR50, ACR70, DAS28-CRP < 3.2, DAS28-CRP < 2.6, Simple Disease Activity Index (SDAI) < 3.3, and Clinical Disease Activity Index (CDAI) < 2.8. Additional endpoints were changes from baseline to week 12 in DAS28-CRP, HAQ-DI, pain Visual Analog Scale (VAS), and Patient Global Assessment of Disease Activity (PtGA).

Statistical Analysis

In the MAIC [8], we identified covariates that were statistically significantly different at baseline. Based on clinical judgement, we excluded covariates that were likely associated with each other to avoid potential collinearity and unnecessary reduction in effective sample size. This first step identified the following variables: age, oral glucocorticoid use, TJC68, SJC66, CRP, and PtGA. A logistic propensity score model was used to estimate weights for the patient-level data from TARGET so that the weighted mean baseline characteristics matched those observed in SELECT-BEYOND. The effective sample size (ESS), defined as the number of non-weighted patients needed to calculate parameter estimates with the same precision as the weighted sample estimate, was calculated to test model validity. As with typical sample size, a large value is preferable to a small value, as the larger sample contains more information.

In the STC [9], the sarilumab treatment effect in SELECT-BEYOND was simulated based on the efficacy data from TARGET, using a regression model adjusted for the covariates of age, oral glucocorticoid use, TJC68, SJC66, CRP, and PtGA, centered at their mean values from SELECT-BEYOND. For consistency, the same covariates were selected as for the MAIC.

Treatment effects for the weight-adjusted (MAIC) and regression-simulated TARGET population (STC) at week 12 were compared with published aggregate data from SELECT-BEYOND via a Bucher indirect treatment comparison, thereby correcting for the placebo values [6]. An as-observed analysis was used to account for dropouts and early terminations.

For both the MAIC and STC, the placebo-corrected odds ratios (OR) of achieving discrete outcomes (sarilumab vs. upadacitinib) were calculated. Continuous parameters were also analyzed using placebo-corrected values. All statistical analyses were conducted using R, version 3.6.2 (packages base, stats, Hmisc, haven, utils, weights, questionr, and reldist) and SAS, version 9.4 (Cary, NC, USA). Both TARGET and SELECT-BEYOND studies were approved by the appropriate ethics committees/institutional review boards [10, 11].

Compliance with Ethics Guidelines

This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Results

The analysis included 365 patients from TARGET and aggregated data of 333 patients from SELECT-BEYOND. Details of population screening and matching analyses are depicted in Fig. 1. Week 12 response rates and mean changes from baseline from the TARGET and SELECT-BEYOND trials in the outcome measures used in the MAIC and STC analyses described below are shown in Fig. S1 in the Supplementary Material.

Fig. 1
figure 1

Consolidated Standards of Reporting Trials (CONSORT)-style flow chart depicting patient numbers for screening and matching analyses

Overall, patient populations in the two trials had comparable baseline characteristics.

For the MAIC, the parameters that had statistically significant differences between the TARGET sarilumab 200 mg q2w group and the SELECT-BEYOND upadacitinib 15 mg/day group were: age, previous treatment with at least one TNFα inhibitor, oral glucocorticoid use, SJC66, PtGA, pain VAS, DAS28-CRP and CRP (Table 1). Of these, we chose age (TARGET, 52.4 years; SELECT-BEYOND, 57.0 years), oral glucocorticoid use (62 vs. 47%), SJC66 (20.1 vs. 16.7), PtGA (70 vs. 67 mm), and CRP (28.4 vs. 16.2 mg/l) as covariates for the MAIC (Table 2). In addition, we added TJC68 (29.5 vs. 28.2) to the list of MAIC covariates, since TJC28 (a reduced version of TJC68) is a component of DAS28. In both trials, response was similar regardless of number of TNFα inhibitors, hence we excluded this potential covariate [12, 13]. After weighting for the MAIC analysis, the mean ± SD values of age, TJC68, SJC66, and CRP, and percentage with oral glucocorticoid use, in patients from TARGET became identical to those from SELECT-BEYOND. In this process, the ESS of TARGET was reduced to 46% (166/365) of the initial population (sarilumab, 92; placebo, 74). In STC, baseline characteristics were adjusted via a regression equation instead of a propensity score matching approach.

Table 1 Comparison of baseline characteristics of patients in TARGET and SELECT-BEYOND
Table 2 Baseline characteristics of patients in TARGET (before and after the MAIC) and SELECT-BEYOND [10, 11]

After both the MAIC and STC, the odds of achieving each and every composite clinical outcome were not significantly different between sarilumab and upadacitinib (Fig. 2a, b). The odds of achieving CDAI < 2.8 did not differ significantly between the two DMARDs (MAIC OR = 10.8, 95% CI 0.0 – 6597.6; STC OR = 6.3, CI 0.2–170.4), which was also the case for DAS28-CRP < 2.6 (MAIC OR = 0.7, CI 0.1–3.3; STC OR = 4.2, CI 0.9–19.3), DAS28-CRP < 3.2 (MAIC OR = 0.95, CI 0.3–3.7; STC OR = 1.8, CI 0.5–6.0), ACR70 (MAIC OR = 1.2, CI 0.2–8.8; STC OR = 4.6, CI 0.7–31.6), ACR50 (MAIC OR = 0.5, CI 0.1–2.0; STC OR = 1.0, CI 0.3–3.1), and ACR20 (MAIC OR = 0.6, CI 0.2–1.8; STC OR = 0.7, CI 0.3–1.7).

Fig. 2
figure 2

Odds ratio (sarilumab vs. upadacitinib, placebo-corrected) of achieving various composite clinical outcomes. aThe OR calculation of attaining SDAI < 3.3 in the MAIC analysis was not performed because the estimate of the number of placebo-treated patients with such a score was zero. bIn the MAIC, the effective sample size of TARGET was reduced to 46% (166/365) of the initial population (sarilumab, n = 92; placebo, n = 74). cIn the STC, by design, baseline characteristics were adjusted via a regression equation instead of a propensity score matching approach. ACR 20/50/70 American College of Rheumatology response criteria, CDAI Clinical Disease Activity Index, CI confidence interval, DAS28-CRP Disease Activity Score-28 for Rheumatoid Arthritis with C-reactive protein, MAIC matching-adjusted indirect comparison, SDAI Simple Disease Activity Index, STC simulated treatment comparison

These findings were mirrored in all continuous outcomes evaluated, as evidenced by no appreciable differences detected between the treatment effects of sarilumab and upadacitinib (Fig. 3). There was no significant difference (least squares mean difference [LSMD]) in score change between treatments for DAS28-CRP (MAIC 0.1, 95% CI − 0.6 to 0.8; STC − 0.2, CI − 0.7 to 0.4 [Fig. 3a]) and HAQ-DI (MAIC 0.1, CI − 0.2 to 0.3; STC 0.0, CI − 0.2 to 0.3 [Fig. 3b]). Further, there was no significant difference (LMSD) in pain VAS change between the treatments (MAIC − 0.3, CI − 14.7 to 14.1; STC − 0.2, CI − 11.2 to 10.8 [Fig. 3c]).

Fig. 3
figure 3

Difference in score changes in continuous outcomes from baseline to week 12, sarilumab minus upadacitinib (placebo-corrected)a. aIn the MAIC, the effective sample size of TARGET was reduced to 46% (166/365) of the initial population (sarilumab, n = 92; placebo, n = 74). In the STC, by design, baseline characteristics were adjusted via a regression equation instead of a propensity score matching approach. CI confidence interval, DAS28-CRP Disease Activity Score-28 for Rheumatoid Arthritis with C-reactive protein, HAQ-DI Health Assessment Questionnaire—disability index, LSMD least squares mean difference, MAIC matching-adjusted indirect comparison, STC simulated treatment comparison, VAS visual analog scale

Discussion

These data suggest that, at 12 weeks, the efficacy of sarilumab, an IL-6 receptor blocker, is comparable to that of upadacitinib, a JAK antagonist, in patients with RA who had an inadequate response to one or more previous bDMARD treatments and who were currently receiving conventional synthetic DMARDs.

In the absence of a head-to-head randomized controlled trial, this indirect comparison between sarilumab and upadacitinib in combination with conventional synthetic DMARDs may be meaningful, as similar results were obtained using two different methodologies. Since the relative reduction of sample size in studies using MAIC analysis was shown to decrease by 74% [14], and by as much as 80% [7, 15], our model, with the sample size reduction of only 54%, is less problematic. Furthermore, reduction in effective sample size in MAIC is considered to be less of a limitation than meta-regression, in which the number of trials, rather than the number of patients, is required to exceed the number of baseline characteristics used for adjustment [16]. However, it should be noted that only a properly powered, randomized, head-to-head study of the two molecules could truly establish non-inferiority.

To the best of our knowledge, this analysis was the first indirect comparison of sarilumab and upadacitinib, and one of the first studies in RA to utilize the STC methodology. With the proliferation of available targeted therapies for RA, and with a dearth of head-to-head randomized trials, comparing treatment effects indirectly using statistical methods is becoming increasingly performed. Recently, two studies compared RA therapies using a MAIC: one assessed the effects of upadacitinib against those of another JAK inhibitor, tofacitinib [17], and the other compared the JAK inhibitor baricitinib with tofacitinib, the TNFα inhibitor adalimumab, and the IL-6 receptor inhibitor tocilizumab [12]. Both studies used approaches similar to ours, with some differences in the adjustment factors. The comparison of upadacitinib and tofacitinib was conducted by weighing data from the upadacitinib trial SELECT-MONOTHERAPY based on age, sex, race, and the baseline values of SJC66/28, TJC68/28, CRP, and patients’ global assessment [17]; the study concluded that upadacitinib was associated with improved outcomes versus tofacitinib. The four-treatment comparison was conducted by adjusting data from the baricitinib trial RA-BEGIN based on age, sex, and the baseline scores of DAS28-ESR, pain VAS, and HAQ-DI [12]; in this study, greater pain reduction and improved physical function were found for baricitinib with adalimumab and tocilizumab, but not for baricitinib versus tofacitinib. A careful selection of adjustment factors is important; while these are selected based on subject matter expertise, it may be possible to incorrectly include some factors or omit others. Using factors that are not treatment effect modifiers, or missing a treatment effect modifier, influences how precise the analysis is and may introduce bias [18].

Although indirect comparison techniques based on population adjustment offer advantages over network meta-analyses (primarily in the fact that they do not assume equivalence of baseline characteristics), they are associated with certain limitations. In the case of the MAIC, the key limitation is the reduction of the ESS [8], and in the case of both the MAIC and STC (and the population adjustment analyses in general), one cannot adjust for potential modifiers that have not been captured in the study or are unavailable in published manuscripts, such as the conventional synthetic DMARD and dose a patient was receiving, or the number of prior bDMARD failures, or certain comorbidities (e.g., fibromyalgia) that may influence treatment response. A further limitation of this analysis is that lack of patient-level data for both studies meant that we could not perform univariable and multivariable logistic regression analyses to identify factors that pertained to (prior) treatment resistance in each treatment group. To confirm our findings, novel modeling methods such as multilevel network meta-regression for population-adjusted treatment comparisons [19] could be used. An effectiveness analysis, using a similar approach to compare studies of real-world data, would also be a possibility. However, it should be noted that use of each of these methods is hypothesis-generating rather than conclusive evidence.

Conclusions

Our results, obtained using two population-adjusted indirect comparisons, suggest a similar effect of sarilumab and upadacitinib when given in combination with stable conventional synthetic DMARDs. Indirect comparisons have become increasingly common in assessing comparative efficacy; their use should be encouraged but critically evaluated.