FormalPara Key Summary Points

Why carry out this study?

 Dolutegravir, Elvitegravir, Raltegravir and Darunavir are commonly used core agents for HIV antiretroviral therapy.

 The literature on real-world effectiveness of these core agents is limited but may help guide treatment decisions in a healthcare setting.

 The aim of this study was to compare virologic failure among treatment-naïve people living with HIV initiating Dolutegravir, Elvitegravir, Raltegravir or Darunavir in a large US clinical cohort.

What was learned from the study?

 Virologic failure risks did not differ between Elvitegravir and Dolutegravir.

 Dolutegravir had more favourable virologic outcomes than Raltegravir and Darunavir, although Raltegravir and Darunavir were preferentially prescribed to sicker people living with HIV.

Introduction

Human immunodeficiency virus (HIV), which was previously a life-threatening illness, has now become a largely manageable chronic disease with people living with HIV (PLWH) experiencing improvements in quality of life and prolonged survival thanks to developments in antiretroviral therapy (ART) [1, 2]. The integrase strand transfer inhibitor (INSTI) class of ART medications has demonstrated a more rapid and sustained reduction in HIV viral load and a greater increase in CD4 counts when compared with protease inhibitors (PI)- or Efavirenz (EFV)-based regimens in clinical trials [3, 4]. At the time of this study, the US Department of Health and Human Services (DHHS) clinical guidelines recommended the use of the INSTIs Dolutegravir (DTG), Elvitegravir (EVG) or Raltegravir (RAL), or the boosted PI Darunavir (DRV), as core agents in ART regimens for PLWH initiating ART [5]. In the latest guidelines, Bictegravir, DTG and RAL are the recommended initial core agents. However, both EVG/c and boosted DRV remain recommended when a single tablet regimen (STR) is preferred (EVG/c, DRV/c), before HIV drug resistance results are available (boosted DRV), or in the presence of chronic kidney disease [DTG + lamivudine (3TC), DRV/r + 3TC, DRV/r + RAL] [6].

DTG and DRV both have a higher genetic barrier to resistance than RAL and EVG [6,7,8,9,10,11,12]. Unlike EVG and DRV, DTG and RAL do not require co-administration with a pharmacokinetic boosting agent [6, 13], reducing the potential for drug–drug interactions [6]. Clinical trials have demonstrated that, in ART-naïve PLWH, DTG was superior to DRV [8, 9], as well as non-inferior to RAL, in terms of virologic suppression (achieving HIV-1 RNA < 50 copies/mL by the FDA Snapshot algorithm) [7, 10].

Although clinical trials have compared the efficacy of DTG, EVG, RAL and DRV, there remains a need to understand their clinical effectiveness and treatment patterns in a real-world setting to help guide treatment decisions by healthcare professionals. To address this need, this study aimed to compare the clinical effectiveness (i.e., time to virologic failure) for ART-naïve PLWH initiating DTG, EVG, RAL or DRV-based regimens in a large US clinical cohort.

Methods

Study Design

This study utilised data from the Observational Pharmaco-Epidemiology Research and Analysis (OPERA) cohort, a database based on electronic medical records data from 85 clinics across the US. The study population consisted of ART-naïve PLWH initiating a DTG-, EVG-, RAL- or DRV-based regimen containing ≥ 3 antiretrovirals (ARV) between 12 August 2013 (approval date of DTG, the last agent of interest approved by the FDA) and 31 July 2016, with follow-up through 31 July 2017 to allow for a minimum of 1 year of potential follow-up. Each individual contributed person–time from the index date (initiation of a core agent of interest) until the first of the following censoring events: (1) discontinuation of core agent of interest, (2) 12 months after a person’s last clinical contact (telephone or visit), (3) death or (4) study end (31 July 2017). A 12-month baseline period preceding core agent initiation was used to assess demographics and clinical characteristics.

Study Population

Eligible PLWH were identified as being ART-naïve, ≥ 13 years of age at index, with a diagnosis of HIV-1 and at least one HIV-1 viral load test and one CD4 lymphocyte test ≤ 90 days prior to index date. Individuals with HIV-2, prior exposure to post-exposure prophylaxis or pre-exposure prophylaxis, and those who were prescribed DTG, EVG, RAL, or DRV as part of a clinical trial, were excluded from this study.

Any ART regimens containing one of the core agents of interest (DTG, EVG, RAL or DRV), in combination with at least two other antiretroviral drugs, were included in the analysis. Regimens that included more than one core agent of interest or contained less than three total antiretroviral drugs were excluded.

PLWH were classified as ART-naïve if they had no recorded history of ART prior to their index regimen and a baseline viral load ≥ 1000 copies/mL. Individuals with no previous ART history, but a baseline viral load < 1000 copies/mL were excluded from the study to avoid potential misclassification due to the possibility of unreported previous ART experience.

Study Outcomes

Time to virologic failure was the main outcome of interest. Virologic failure was defined as either (1) 2 consecutive viral loads ≥ 200 copies/mL after 36 weeks on ART; (2) 1 viral load ≥ 200 copies/mL after 36 weeks on ART immediately followed by core agent discontinuation; (3) 2 consecutive viral loads ≥ 200 copies/mL after suppression to < 50 copies/mL prior to 36 weeks on ART; or (4) 1 viral load ≥ 200 copies/mL after suppression to < 50 copies/mL prior to 36 weeks on ART directly followed by core agent discontinuation. Other outcomes assessed were viral suppression (i.e., an HIV RNA < 50 copies/mL), changes in CD4 cell count, and durability of ART regimen (i.e., discontinuation of the core agent of interest).

Statistical Analysis

Demographics, clinical characteristics, and all outcomes were summarised using descriptive statistics, including medians with interquartile ranges (IQR) for continuous variables and frequencies and proportions for categorical variables. Statistical comparisons by core agent were performed using Pearson’s Chi-square or Fisher’s exact tests for categorical variables and Wilcoxon rank-sum test for continuous variables.

Mortality risk was assessed using the Veterans Aging Cohort Study Index (VACS Index), which is a composite index used to estimate a 5-year risk of all-cause mortality. The VACS Index is calculated based on age, CD4 cell count, HIV viral load, haemoglobin, FIB-4 index, estimated glomerular filtration rate, and HCV co-infection. A higher VACS score is associated with a higher risk of mortality [14]. Comorbidities assessed included cardiovascular disease (arrythmia, coronary artery disease, cerebrovascular disease, peripheral vascular disease), invasive cancer, endocrine disorders (diabetes mellitus, hyperlipidemia, hypothyroidism, hyperthyroidism), mental health conditions (anxiety disorders, bipolar or manic disorders, major depressive disorder, schizophrenic disorder, dementia, suicidality), liver diseases (alcohol/drug related, viral or non-viral hepatitis, cirrhosis), bone disease (osteopenia, osteoporosis, pathologic fracture), peripheral neuropathy, renal disease (renal impairment, moderate/severe chronic kidney disease, end stage renal disease), hypertension, rheumatoid arthritis, and alcohol/drug dependence or abuse.

A multivariate Cox proportional hazards model was employed to assess the association between core agent-based regimens and time to virologic failure. The proportional hazards assumption was assessed graphically by plotting the log of the cumulative hazard over time. Baseline covariates included in the multivariate model consisted of age, sex, race, CD4 cell count ≤ 200 cells/µL, HIV RNA ≥ 100,000 copies/mL, history of AIDS, VACS score (15–29, 30–44 or ≥ 45 vs. < 15), number of non-ART prescriptions (1–2 or ≥ 3 vs. 0), drug abuse, and history of syphilis infection. These covariates were selected a priori, based on the literature. Baseline year of ART initiation, men who have sex with men (MSM), and type of health insurance were also included in the model and were selected a posteriori, based on the descriptive analyses.

The OPERA database complies with all HIPAA and HITECH requirements which expand upon the ethical principles detailed in the 1964 Declaration of Helsinki. The OPERA database receives annual institutional review board (IRB) approval by Advarra IRB including a waiver of informed consent and authorisation for use of protected health information.

Results

Study Population

The study population consisted of 4049 ART-naïve PLWH in care at an OPERA participating clinic. The most common core agent initiated in this population was EVG (47.4%), followed by DTG (34.7%), DRV (14.6%) and RAL (3.2%; Table 1). The median follow-up time was similar for DTG and EVG (19.0 vs. 19.1 months, respectively), but significantly shorter for RAL and DRV (14.8 and 15.3 months, respectively; both p < 0.0001) compared with DTG.

Table 1 Baseline demographic and clinical characteristics of ART-naïve PLWH, by core agent regimen

Baseline Characteristics

Overall, there were no statistically significant differences between DTG and EVG in demographic characteristics and some clinical characteristics such as the median baseline viral load, baseline CD4 counts and VACS score (risk of mortality). However, PLWH on DTG were more likely than those on EVG to have any comorbidity (57.4% vs. 52.1%; p = 0.002) and public payer coverage (62.4% vs. 52.9%; p < 0.0001; Table 1).

In contrast, most demographic and clinical characteristics were statistically significantly different with DTG compared to RAL and DRV (Table 1). PLWH on RAL and DRV were more likely to be older, female, and to have a history of AIDS-defining illness compared to those on DTG. PLWH on RAL and DRV were also generally sicker, with lower median CD4 cell counts and higher median VACS scores compared with those on DTG. Compared to DTG users, comorbidities were more common among RAL users, whereas DRV users were more likely to be African-American and to have higher median viral loads.

As expected, the distribution of ART initiation over time varied by core agent according to approval date. DTG initiation was most common in 2015 or 2016, whereas EVG and DRV initiation was most common in 2014 or 2015 and RAL initiation was most common in 2013 or 2014 (Table 1).

Virologic, Immunologic and Treatment Outcomes

During the observation period, a statistically significantly higher proportion of DTG initiators achieved virologic suppression during their initial ART regimen (78.7%) compared with EVG (73.6%, p < 0.05), RAL (51.9%, p < 0.0001) and DRV initiators (48.6%, p < 0.0001; Fig. 1a). In addition, DTG initiators were statistically significantly less likely to experience virologic failure during follow-up (6.5%), compared with RAL (22.9%, p < 0.0001) or DRV initiators (13.8%, p < 0.0001), although no difference was detected with EVG (8.3%, p > 0.05; Fig. 1b; Table 2). Criteria for virologic failure varied across core agent groups (Table 2). There were no statistically significant differences in the time to failure between core agents (Table 2).

Fig. 1
figure 1

Virologic suppressiona (a) and virologic failureb (b) during follow-upc, by core agent regimen. ART antiretroviral therapy, DRV Darunavir, DTG Dolutegravir, EVG Elvitegravir, IQR interquartile range, RAL Raltegravir. aAchieved virologic suppression (defined as < 50 copies/mL) during initial ART regimen. bVirologic failure defined as: (1) 2 viral loads ≥ 200 copies/mL after 36 weeks on ART, or (2) 1 viral load ≥ 200 copies/mL after 36 weeks on ART + core agent discontinuation, or (3) 2 viral loads ≥ 200 copies/mL after suppression prior to 36 weeks on ART, or (4) viral load ≥ 200 copies/mL after suppression prior to 36 weeks on ART + core agent discontinuation. cEnd of follow up defined as discontinuation of core agent of interest, death, lost to follow-up or study end (31 July 2017). Core agent comparison with DTG: *p < 0.05; **p < 0.001; ***p < 0.0001

Table 2 Virologic failure during follow-up, by core agent regimen

A larger median increase in CD4 cell counts from baseline was observed with DTG initiation (194 cells/µL) compared with RAL (95 cells/µL; p < 0.0001) and DRV (128 cells/µL; p < 0.0001) initiation but did not differ from EVG initiation (190 cells/µL; p = 0.1613).

The proportion of PLWH who discontinued their core agent regimen for any reason by the end of the observation period was statistically significantly smaller for those initiating DTG (30.3%) compared to EVG (35.2%; p < 0.01), RAL (75.6%; p < 0.0001) and DRV (57.5%; p < 0.0001).

Time to Virologic Failure

In a multivariate Cox proportional hazards model adjusted for baseline covariates (Fig. 2), no statistical difference in the hazard of virologic failure was detected between DTG and EVG [adjusted hazard ratio (aHR): 1.24, 95% confidence interval (CI) 0.94, 1.64]. RAL and DRV use was associated with a significantly higher likelihood of virologic failure when compared with DTG (RAL aHR: 4.70, 95% CI 3.03, 7.30; DRV aHR: 2.38, 95% CI 1.72, 3.29). Other covariates were also significantly associated with virologic failure across core agent groups, such as African-American race (aHR: 1.51; 95% CI 1.20, 1.91), CD4 cell count ≤ 200 cells/µL (aHR: 1.60, 95% CI 1.17, 2.18), and history of syphilis (aHR: 1.28, 95% CI 1.00, 1.63).

Fig. 2
figure 2

Time to virologic failurea modelled with Cox multivariate proportional hazards modelb. AIDS acquired immunodeficiency disease, ART antiretroviral therapy, aHR adjusted hazard ratio, CI confidence interval, DRV Darunavir, DTG Dolutegravir, EVG Elvitegravir, HIV human immunodeficiency virus, MSM men who have sex with men, RAL Raltegravir, ref reference, VACS veterans aging cohort study index. aVirologic failure defined as: (1) 2 viral loads ≥ 200 copies/mL after 36 weeks on ART or (2) 1 viral load ≥ 200 copies/mL after 36 weeks on ART + core agent discontinuation, or (3) 2 viral loads ≥ 200 copies/mL after suppression prior to 36 weeks on ART, or (4) viral load ≥ 200 copies/mL after suppression prior to 36 weeks on ART + core agent discontinuation. bModel excluding 385 individuals missing a baseline VACS score. cBaseline covariates included in the multivariate model were selected a priori [age, sex, race, CD4 cell count ≤ 200 cells/µL, HIV RNA ≥ 100,000 copies/mL, history of AIDS, VACS score (15–29, 30–44 or ≥ 45 vs. < 15), number of non-ART prescriptions (1–2 or ≥ 3 vs. 0), drug abuse and history of syphilis infection] or posteriori (year of ART initiation, MSM and type of health insurance)

Discussion

This analysis aimed to compare the clinical effectiveness of four common ART core agents in ART-naïve PLWH in a real-world setting. Over the follow-up period, DTG initiators consistently experienced favourable virologic outcomes compared with RAL and DRV initiators. However, when compared to DTG, EVG initiators were slightly less likely to achieve suppression, but there was no difference in the proportion who failed.

The DTG and EVG results of this real-world analysis are consistent with the results from clinical trials in ART-naïve PLWH. In trials, the proportion of ART-naïve PLWH achieving suppression with DTG ranged from 88 to 90% at 48 weeks, 69–80% at 96 weeks and 71% at 144 weeks [4, 7,8,9,10], comparable to the proportion achieving suppression in OPERA (78.7% over a median follow-up time of 83 weeks). Suppression was achieved with EVG in trials by 88–90% at 48 weeks, 83–84% at 96 weeks and 80% at 144 weeks [15,16,17,18], comparable to OPERA (73.6% over a median follow-up time of 83 weeks).

However, for RAL and DRV, OPERA estimates differed from trial results. With RAL, suppression was achieved in trials by 85% at 48 weeks and by 76% at 96 weeks [7, 10], while only 51.9% achieved suppression in OPERA over a median of 64 weeks. Suppression was achieved with DRV in trials by 83% at 48 weeks and by 68% at 96 weeks [8, 9], but only 48.6% of DRV initiators in OPERA achieved suppression over a median of 66 weeks.

In multivariate analyses, compared to DTG, both RAL and DRV were associated with increased hazards of virologic failure, after adjusting for important confounders: RAL (aHR: 4.70, 95% CI 3.03, 7.30) and DRV (aHR: 2.38, 95% CI 1.72, 3.29). Similarly, the FLAMINGO study (NCT01449929) of DTG versus DRV in ART-naïve PLWH demonstrated that DTG was superior to DRV on the primary endpoint of virologic suppression by the FDA Snapshot algorithm [8, 9]. The SPRING-2 study (NCT01227824) of DTG versus RAL in ART-naïve patients showed that DTG was non-inferior to RAL in terms of virologic suppression [7, 10].

No statistically significant difference was observed between DTG and EVG in the adjusted survival analysis (aHR: 1.24, 95% CI 0.94, 1.64). An indirect comparison of the efficacy of EVG and DTG using data from two clinical trials showed no statistically significant difference in suppression (viral load < 50) between DTG and EVG [19].

In terms of virologic failure, OPERA estimates for DTG and EVG were again consistent with trial results. In an indirect comparison of EVG and DTG from two trials, virological failure defined as a viral load > 50 copies/mL was comparable between DTG (10%) and EVG (7%) [19], similar to the 6.5% and 8.3% estimated in OPERA for DTG and EVG, respectively. However, in the SPRING-2 study, 48-week and 96-week virologic failure was experienced by 5% of DTG patients and 8–10% of RAL patients [7, 10], compared with 6.5% and 22.9% failure with DTG and RAL in OPERA, respectively. Another real-world analysis of ART-naïve PLWH demonstrated a lower proportion of PLWH on DTG experiencing virologic failure (7%), defined as one viral load ≥ 400 copies/mL ≥ 6 months after ART initiation, when compared with other INSTIs (12%) or DRV-based regimens (28%) [20, 21].

Several key elements may explain the poorer virologic response to RAL and DRV in OPERA. First, it should be noted that virologic failure was defined differently here compared with the SPRING-2 study (weeks 24–48: two consecutive HIV viral load ≥ 50 copies/mL; after week 48: viral load ≥ 200 copies/mL or investigator’s discretion if viral load ≥ 50 and < 200 copies/mL). In addition, the maximum follow-up time was longer in this study, thus increasing the likelihood of observing failure. Further, the RAL and DRV initiators in OPERA were significantly different from the DTG initiators. Namely, RAL and DRV initiators were older, sicker and more likely to be women than DTG initiators, suggesting that RAL and DRV were more frequently being prescribed to individuals with more complex medical presentations in real-world clinical practice during the time of the study. These results highlight that different patient groups were using RAL- and DRV-based regimens at the time period of this analysis. Although many of the baseline differences between core agent groups were controlled for in the multivariate analysis, the poorer health of the RAL and DRV groups may nonetheless have contributed to their worse outcomes.

In addition, clinical trials provide estimates of drug efficacy in the best conditions for success, whereas observational studies can provide effectiveness estimates in the real world, in potentially suboptimal conditions. This also may explain the high proportion of PLWH discontinuing the core agent of interest during the study period, as discontinuations can be driven by personal preference, treatment access/reimbursement, or mild to moderate undesirable effects rather than virologic failure.

Our study is not without limitations, and the results should be interpreted with these in mind. As with all observational studies, this analysis is subject to residual confounding. Although every effort was made to control for potential confounders that may affect the results, missing data and unknown confounders that were not included in the electronic medical records could have led to some bias. Indeed, in the US, RAL- or DRV-based regimens were disproportionately prescribed to PLWH with more complex medical presentation, such as older age, lower CD4 count, higher VACS score, or comorbid conditions. While statistical adjustments were performed to account for the observed channelling bias, it is possible that this bias was not fully addressed with the methods employed in this analysis due to unobserved confounders. The comparisons of CD4 count and virologic suppression were unadjusted for baseline covariates and could therefore be confounded by varying demographic and clinical characteristics. However, time to virologic failure analyses were adjusted for important confounders and were therefore less likely to be biased, although they were not adjusted for the presence of comorbidities at baseline. Due to significant differences between core agent groups, the covariates of year of ART initiation, MSM and type of health insurance were all selected a posteriori for inclusion in the Cox proportional hazards model. Comparing the results with and without these covariates showed that there was no meaningful impact from adding these covariates selected a posteriori to the modeling results.

Further, resistance data were not available to assess if the virologic failures observed here were due to development of resistance. This study also does not include data on adherence to ART, which may affect virologic outcomes. However, effectiveness studies aim to estimate the performance of drugs in a real-world setting, in which adherence is imperfect and often not recorded in electronic medical records. Suboptimal adherence to ART is common, being reported by close to 40% of PLWH in > 26 countries, and is strongly associated with the risk of virologic failure [22]. STRs can improve adherence [23], and availability of DTG- and EVG-based STRs may in part explain the favourable virologic outcomes observed with these agents. Lack of resistance or adherence data are not thought be a significant limitation of this analysis. While having these data would have been useful for understanding why patients experienced virologic failure, the observation that the risk of virologic failure differed between groups remains of clinical interest regardless of the cause.

Of note, the assessment of viral load in routine clinical practice is much more variable and is typically performed less frequently than in clinical trials where viral load is often assessed monthly. This may have implications for assessing time to viral suppression or time to virologic failure in real-world data. In addition, estimates of the proportions of PLWH who achieved suppression or experienced virologic failure do not consider differences in follow-up time as would estimates of rates. However, no differences were observed in duration of follow-up and time to failure between groups. Finally, the newest INSTI, Bictegravir, had not yet been approved at the time of this study and was therefore not included as a comparator in this analysis.

Strengths of this study include the large sample size and the wide representation of PLWH in care across the US, due to the inclusion of PLWH in the OPERA cohort from both small rural clinics and from large urban centres, meaning that this study is highly representative of the real-world HIV epidemic in the US. Indeed, the OPERA cohort includes approximately 7% of PLWH in care in the US, with over 80,000 PLWH at the time of this study. In addition, the use of electronic medical records allowed access to extensive clinical and laboratory information and is reflective of real-world clinical practice.

Conclusions

This study demonstrates that, in ART-naïve PLWH with varying characteristics, DTG performed better than RAL and DRV, and comparably to EVG, in terms of virologic outcomes. While baseline clinical characteristics were similar for DTG and EVG, differences in baseline clinical characteristics may have impacted the performance of RAL and DRV compared to DTG.