Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

An external validation of the QCOVID3 risk prediction algorithm for risk of hospitalisation and death from COVID-19: An observational, prospective cohort study of 1.66m vaccinated adults in Wales, UK

  • Jane Lyons ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    J.Lyons@Swansea.ac.uk

    Affiliation Faculty of Medicine, Health & Life Science, Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom

  • Vahé Nafilyan,

    Roles Conceptualization, Formal analysis, Methodology, Resources, Software, Validation, Writing – review & editing

    Affiliation Office of National Statistics, Newport, United Kingdom

  • Ashley Akbari,

    Roles Conceptualization, Data curation, Methodology, Software, Validation, Writing – review & editing

    Affiliation Faculty of Medicine, Health & Life Science, Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom

  • Stuart Bedston,

    Roles Conceptualization, Data curation, Methodology, Software, Validation, Writing – review & editing

    Affiliation Faculty of Medicine, Health & Life Science, Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom

  • Ewen Harrison,

    Roles Conceptualization, Methodology, Validation, Writing – review & editing

    Affiliation Usher Institute, Centre for Medical Informatics, University of Edinburgh, Edinburgh, United Kingdom

  • Andrew Hayward,

    Roles Conceptualization, Methodology, Validation, Writing – review & editing

    Affiliation Department of Epidemiology and Public Health, University College London, London, United Kingdom

  • Julia Hippisley-Cox,

    Roles Conceptualization, Funding acquisition, Methodology, Resources, Software, Validation, Writing – review & editing

    Affiliation Nuffield Department, Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom

  • Frank Kee,

    Roles Conceptualization, Methodology, Validation, Writing – review & editing

    Affiliation School of Medicine, Dentistry and Biomedical Sciences, Queen’s University Belfast, Belfast, United Kingdom

  • Kamlesh Khunti,

    Roles Conceptualization, Methodology, Validation, Writing – review & editing

    Affiliation Diabetes Research Centre, University of Leicester, Leicester, United Kingdom

  • Shamim Rahman,

    Roles Conceptualization, Methodology, Validation, Writing – review & editing

    Affiliation Department of Health and Social Care, Mental Health and Disabilities Analysis, London, United Kingdom

  • Aziz Sheikh,

    Roles Conceptualization, Methodology, Validation, Writing – review & editing

    Affiliation Usher Institute, University of Edinburgh, Edinburgh, United Kingdom

  • Fatemeh Torabi,

    Roles Conceptualization, Data curation, Methodology, Software, Validation, Writing – review & editing

    Affiliation Faculty of Medicine, Health & Life Science, Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom

  • Ronan A. Lyons

    Roles Conceptualization, Funding acquisition, Methodology, Supervision, Validation, Writing – review & editing

    Affiliation Faculty of Medicine, Health & Life Science, Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom

Abstract

Introduction

At the start of the COVID-19 pandemic there was an urgent need to identify individuals at highest risk of severe outcomes, such as hospitalisation and death following infection. The QCOVID risk prediction algorithms emerged as key tools in facilitating this which were further developed during the second wave of the COVID-19 pandemic to identify groups of people at highest risk of severe COVID-19 related outcomes following one or two doses of vaccine.

Objectives

To externally validate the QCOVID3 algorithm based on primary and secondary care records for Wales, UK.

Methods

We conducted an observational, prospective cohort based on electronic health care records for 1.66m vaccinated adults living in Wales on 8th December 2020, with follow-up until 15th June 2021. Follow-up started from day 14 post vaccination to allow the full effect of the vaccine.

Results

The scores produced by the QCOVID3 risk algorithm showed high levels of discrimination for both COVID-19 related deaths and hospital admissions and good calibration (Harrell C statistic: ≥ 0.828).

Conclusion

This validation of the updated QCOVID3 risk algorithms in the adult vaccinated Welsh population has shown that the algorithms are valid for use in the Welsh population, and applicable on a population independent of the original study, which has not been previously reported. This study provides further evidence that the QCOVID algorithms can help inform public health risk management on the ongoing surveillance and intervention to manage COVID-19 related risks.

Introduction

Following the emergence of the SARS-CoV-2 infection and at the start of the COVID-19 pandemic, there was an urgent public health need to identify individuals at highest risk of severe outcomes, in particular hospitalisation and death following infection. To support the National Health Service (NHS) and protect the most clinically vulnerable individuals, the Chief Medical Officer for England commissioned the New and Emerging Respiratory Virus Threats Advisory Group (NERVTAG), an expert committee of the Department of Health and Social Care who advise the UK government, to develop the QCOVID risk assessment algorithms for predicting risk of COVID-19 related hospital admissions or death [1]. The algorithms were developed on individual demographic and clinical characteristics from six million primary care patients registered at 1,205 English general practices. Performance metrics demonstrated the predictive algorithms had high levels of discrimination and were well calibrated, which was also shown in three independent validation studies [24]. The calculated risk scores from these algorithms saw an additional ~1.5 million people added to the national shielding patient list, and ~800,000 of those prioritised for vaccination if they had not already received it, highlighting the importance and need for these population risk prediction algorithms for planning and patient management in the case of future infection spikes and pandemics [5]. These original QCOVID algorithms were developed using data from the first wave of the COVID-19 pandemic, prior to the national rollout of the vaccination programme.

Despite the success and effectiveness of the vaccine programme, the discovery of new variants alongside studies showing waning of immunity over time has demonstrated that there remains a risk of COVID-19 infection and subsequent COVID-19 related hospitalisation and death following vaccination [69]. The vaccines efficacy were tested generally on younger healthier volunteers in clinical trials [10]. It is important to identify risk factors associated with COVID-19-related hospitalisation and deaths in all individuals following vaccination since not all patients will achieve immunity. The UK Government advisory group NERVTAG further developed the QCOVID3 risk assessment algorithms to identify groups of people at highest risk of severe COVID-19 related outcomes following one or two doses of vaccine [11]. The QCOVID3 risk algorithms were developed on data from the second wave of the pandemic in England, UK, and includes some additional predictor variables such as vaccine dose (first or second), extending the categorisation of severity of diabetes to include glycated haemoglobin levels, bipolar disorder, schizophrenia, and a seven-day moving average of the background rates of positive SARS-CoV-2 tests per 100,000 people to account for changing infection rates. The risk scores from these algorithms provide further evidence to prioritise high-risk individuals who may need further interventions, such as additional booster vaccinations or treatment with monoclonal antibodies, antivirals or pre-exposure prophylaxis (Evushield) [12]. This was designed to help protect high-risk individuals with the ending of social distance measures, mandatory testing, and self-isolation in the UK.

It is important to replicate and validate prediction algorithms in independent populations to ensure they work in an ‘out of sample’ setting particularly if they could be used clinically in this setting. It is also to inform policy development at a national scale and contribute to the planning and management of individual patient care as well as contribute to the planning of the prevention of future pandemics [13,14]. Pandemic predictive risk assessment algorithms, such as QCOVID3, can be used to identify vulnerable groups of individuals at the highest risk of serious health outcomes as well as identify demographic and clinical groups of individuals who are more or less likely to partake in the uptake of a preventative intervention during a pandemic. Outputs from validating these prediction algorithms can be used to highlight vulnerabilities within healthcare systems as well identify variations in service provision and uptake of vaccinations for planning and managing patient care for future pandemics.

Validation studies were funded to compare the performance of the updated algorithms in each of the four nations in the UK to ensure external validity and provide evidence on the application of the algorithms in managing patient risk over time in different populations [15]. The aim of this particular study was to independently validate the updated published QCOVID3 risk prediction algorithms for risk of COVID-19-related deaths and hospitalisation in vaccinated adults having one or two doses of vaccination by 15th June 2021 in Wales, UK.

Materials and methods

Study design

We conducted an observational, longitudinal, cohort study of vaccinated adults living in Wales from 8th December 2020, with follow-up until 15th June 2021. The outcomes of interest were time to COVID-19 related death and hospitalisation. We assessed the performance of the QCOVID3 algorithms using measures of discrimination and calibration. This paper mirrors the published English study and follows the STROBE and TRIPOD reporting guidelines [11,16,17].

Data sources

This study used routinely collected anonymised, individual-level, population-scale health and demographic data held in the Secure Anonymised Information Linkage (SAIL) Databank to create a retrospective population-based individual-level linked e-cohort [18,19]. For this analysis, we used the Welsh Demographic Service Dataset (WDSD), Welsh Longitudinal General Practice (WLGP), Annual District Death Extract (ADDE) from the Office for National Statistics (ONS) mortality data, Annual District Death Daily (ADDD), Consolidated Death Data Source (CDDS), COVID Vaccination Dataset (CVVD), Patient Episode Database for Wales (PEDW), Care Homes Index (CARE), COVID-19 Test Results (PATD), 2011 Census Wales (CENW), and UK Government published daily infection rates data [20,21].

Sample inclusion criteria and follow up

We defined the population of interest as vaccinated adults living in Wales on 8th December 2020 with follow-up until 15th June 2021. Individuals included were aged between 19 and 100 on the 8th December 2020, registered with a SAIL-providing general practice (86% of Welsh general practices), and who had received one or two vaccinations of Oxford-AstraZeneca or Pfizer-BioNTech within the study period (S1 Fig). Follow-up started from 14 days after receiving each vaccine dose until they had the outcome of interest (COVID-19-related death or hospitalisation), died, migrated out of Wales, or until the end of the study period. Individuals who were vaccinated within 14 days of the study end date were not included due to insufficient follow up time. Individuals who only received one dose of the vaccine during the study period were followed up from 14 days post vaccination until the outcome of interest, death, migration out of Wales, or until the end of the study period. Individuals who received two doses of vaccination were, followed up over two time periods. For the first period, individuals were followed up from 14 days post first vaccination until 14 days after their second vaccination. For the second period, individuals were followed up from 14 days post second vaccination until outcome of interest, death, migration out of Wales, or until the end of the study period.

Outcome of interest

The primary and secondary outcomes were COVID-19-related death and hospitalisation respectively, with time-at-risk calculated from 14 days after vaccination. We utilised a combination of ADDE, ADDD, WDSD and CDDS to identify all deaths of Welsh residents, inclusive of in-hospital and out of hospital deaths. Deaths involving COVID-19 were identified using the tenth revision of the International Classification of Diseases (ICD-10) codes U07.1 or U07.2, or from text fields containing the causes of death within the data sources (ADDD, CDDS). Additionally, deaths involving COVID-19 were also included if the death occurred within 28 days of a positive SARS-CoV-2 infection using the PATD data.

COVID-19-related hospital admission were included if they contained U07.1 or U07.2 ICD10 codes, or, any emergency admission within 14 days following a positive polymerase chain reaction (RT-PCR) COVID-19 test result. Individuals who had a COVID-19 hospitalisation prior to the study start date were not included in the hospital analysis.

Predictor variables

Predictive demographic, clinical, and pharmaceutical variables (Box 1) to validate the updated algorithms were based on the original QCOVID studies [14], which includes the clinical vulnerability group criteria used to identify those advised to shield at the start of the pandemic and risk factors associated with adverse outcomes for respiratory diseases [22,23].

Box 1. List of predictor variables for the QCOVID3 risk equations for vaccinated individuals

Demographic

    • Age in years on 8th December 2020

    • Biological sex at birth

    • Townsend Deprivation Score

    • Ethnicity

    • What is your housing category—care home, homeless or neither?

Have you had a 1st or 2nd dose of Oxford-AstraZeneca or Pfizer-BioNTech COVID-19 vaccination?

    • What is the background daily rate per 100,000 for SARS-CoV-2 infection in the last 7 days?

Lifestyle

Body Mass Index

Conditions

    • Have you had chemotherapy in the last 12 months?

    • Have you had radiotherapy in the last 6 months?

    • Do you have sickle cell disease?

    • Have you a cancer of the blood or bone marrow such as leukaemia, myelodysplastic syndromes, lymphoma or myeloma and are at any stage of treatment?

    • Do you have lung or oral cancer?

Do you have a learning disability or Down’s syndrome?

    • Do you have Chronic Kidney Disease (CKD) and at what stage?

    • Do you have diabetes?

    • Do you have Parkinson’s disease?

    • Do you have epilepsy?

    • Do you have dementia?

    • Do you have Chronic Obstructive Pulmonary Disease (COPD)?

    • Do you have motor neurone disease, multiple sclerosis, myasthenia, or Huntington’s chorea?

    • Do you have coronary heart disease?

    • Do you have heart failure?

    • Do you have peripheral vascular disease?

    • Do you have atrial fibrillation or atrial flutter?

    • Do you have cirrhosis of the liver?

    • Have you had a thrombosis or pulmonary embolus?

Have you had a stroke or transient ischaemic attack?

Do you have bipolar disease or schizophrenia?

Do you have severe combined immunodeficiency?

    • Have you had a solid organ transplant ever or bone marrow transplant in the last 6 months?

For the demographic variables, the WDSD was used to define age, sex, and Townsend score. Townsend score is a measure of deprivation, based on the area of residence, with a higher score indicating a higher level of deprivation [24]. The 2011 Census Wales (CENW) is linked to the cohort in order to derive ethnic groups (i.e. Bangladeshi, Black African, Black Caribbean, Chinese, Indian, Pakistani, Mixed, Other, and White). The ethnic group variable also had a category corresponding to ‘not recorded/unknown’. This category was used whenever the corresponding value was missing (Table 1). To adjust for changing infection rates over the study period, a seven-day moving average of the background rates of positive SARS-CoV-2 tests per 100,000 people using published data was added to the algorithms [20].

thumbnail
Table 1. Demographic and clinical characteristics for the total cohort and those who died or were admitted to hospital with COVID-19.

https://doi.org/10.1371/journal.pone.0285979.t001

The majority of pre-existing conditions were identified in the WLGP primary care data source using Read codes version 2 (CTV2). Where no timeframe was stated, a lookback period was used from 1st January 1998 to the study start date (14 days after first vaccination). For body mass index (BMI), the latest BMI measurement within 5 years was used. BMI records outside this time period and BMIs <15 and >47 were set to missing, with the mean BMI replacing missing values. The highest BMI was included if an individual had multiple BMI records on the latest date. For diabetes, if the latest health record had defined an individual with both type 1 and type 2 diabetes, type 2 took precedence. Patients with diabetes were further categorised by severity according to the most recent HBA1C level in their primary care records (HBA1C levels were categorised at a threshold of 59 mmol/mol). For the housing covariate, if the latest record defined an individual as being homeless and living in a care home, then living in a care home took precedence. For the learning disabilities covariate, if the latest record identified an individual with learning disabilities and Down’s syndrome, then Down’s syndrome was prioritised.

Office of Population Censuses and Surveys (OPCS) Classification of Interventions and Procedures version 4 (OPCS-4) coded conditions in the inpatient (PEDW) data were used to identify chemotherapy status, Chronic Kidney Disease (CKD) stages, bone marrow or stem cell transplant, radiotherapy, and solid organ transplant.

Algorithm validation

The original study developed risk models using cause specific Cox proportional hazard models to calculate hazard ratios and develop the risk scores accounting for the competing risk of death due to other causes [11]. The published QCOVID3 risk equations were applied to the cohort to calculate the risk scores for COVID-19 related hospitalisation and death respectively [25]. The following modifications for the Welsh cohort were required due to SAIL policy and data availability: HIV status was not included in the analysis. Those receiving chemotherapy within one year of study start (14 days following first vaccination) were assigned the chemotherapy group B (middle severity group) coefficients. Additionally, missing published death and vaccine times were replaced with zero [25].

Performance metrics were calculated to validate the QCOVID3 predicted risk of COVID-19 related hospitalisation and death. R2 values, D statistic, and Harrell’s C statistic with corresponding 95% intervals were calculated for the total cohort and by age, sex, and vaccination number [2628]. The R2 values refer to the proportion of variation in survival time explained by the model. The D statistic and Harrell’s C statistic are discrimination measures that quantify the separation in survival between patients with different levels of predicted risks and the extent to which people with higher risk scores have earlier events respectively. To measure calibration, we compared the mean predicted risks with observed risks, by 20ths of predicted risk.

Ethics statement

The use of de-identified data in SAIL complies with National Research Ethics Service (NRES) guidance. Applications to use data held within the SAIL Databank, an ISO: 27001 and UK Statistics Authority (UKSA) Digital Economy Act (DEA) accredited Trusted Research Environment, must first be approved by the independent Information Governance Review Panel (IGRP). The IGRP contains a multidisciplinary professional group, including members of the public, and it gives careful consideration to each project to ensure proper and appropriate use of SAIL data. When access has been granted, it is gained through a privacy protecting safe haven and remote access system referred to as the SAIL Gateway. SAIL project 0911 was approved by IGRP on 26th June 2019 with further amendments to the scope approved to allow rapid analysis as the COVID-19 pandemic unfolded.

SAIL has established an application process to be followed by anyone who would like to access data via SAIL at https://www.saildatabank.com/application-process. Participant consent was not required for this study as all data is anonymised and further encrypted.

Results

The study included 1,656,154 individuals (Table 1). Of these, 787,878 (47.6%) were male, the mean age was 53.9 (SD 18), 920,041 (55.6%) had received two doses of the vaccine with the median time between 1st and 2nd dose being 71 days (IQR: 47–77), and the majority of individuals were from White ethnic backgrounds (1,575,332, 95.1%). Overall, 991,158 (59.8%) had at least one dose of the Oxford-AstraZeneca vaccine and 665,360 (40.2%) had at least one dose of the Pfizer-BioNTech vaccine. Median follow-up time was 60 days (interquartile range 41–76) after the first dose and 48 (22–77) days after the second dose.

In total, there were 353 (0.02%) COVID-19 related deaths and 744 (0.05%) COVID-19-related hospital admissions. In general, individuals who died from COVID-19 were more likely to be male (178, 50.4%), aged 80 years and older (231, 65.4%), and living in more deprived areas (221, 62.6% in quintiles 3–5). Amongst those with a recorded BMI, 61.7% of people who died were overweight or obese. Atrial fibrillation, coronary heart disease, diabetes, and dementia were the pre-existing conditions with the highest proportions of deaths (Table 1).

Individuals with a COVID-19-related admission were more likely to be female (412, 55.4%), aged 70 years and older (538, 72.3%), and living in more deprived areas (480, 64.5% in quintiles 3–5). CKD, coronary heart disease, diabetes, and atrial fibrillation were the pre-existing conditions with the highest proportions of hospitalisations (Table 1).

Table 2 shows the performance metrics of the QCOVID3 algorithm in the Welsh cohort. The metrics have been provided for the total cohort and by age, sex, and the number of vaccinations. For COVID-19 related deaths, the algorithm explained 72.5% (95% CI: 70.3–74.4) of the variation in time to death, the Harrell’s C statistic was 0.939 (95% CI: 0.928–0.950) and the D statistic 3.321 (95% CI: 3.148–3.493). Results when restricted to individuals who only received one vaccination were 72.8% (95% CI: 69.7–75.5), 0.992 (95% CI: 0.987–0.996) and 4.796 (95% CI: 4.608–4.983) respectively. Similar results were found in males and females. Individuals who received two vaccinations and results for some age groups yielded slightly poorer metrics, which was likely due to fewer events.

thumbnail
Table 2. Performance of the QCOVID3 algorithm to predict risk of COVID-19 related death and hospitalisation for the total cohort and by age, sex, and vaccination dose (95% CI).

https://doi.org/10.1371/journal.pone.0285979.t002

For COVID-19 related hospital admissions, the algorithm explained 55.1% (95% CI: 52.4–57.6) of the variation in time to death, the Harrell’s C statistic was 0.828 (95% CI: 0.812–0.845) and the D statistic 2.266 (95% CI: 2.149–2.384). Results restricted to individuals who only received one vaccination yielded the highest performance with 81.5% (95% CI: 79.8–83.0), 0.939 (95% CI: 0.914–0.963) and 4.291 (95% CI: 4.066–4.516) respectively. Similar to the death outcomes, metrics for hospitalisations were not as good for those receiving two vaccinations or in sub-analyses by age groups.

The calibration plots in Figs 1 and 2 show that the predicted and observed risks of COVID-19-related death and hospitalisation were similar, demonstrating that the algorithms were well calibrated. However, there was slight over-prediction in the highest risk group for COVID-19-related deaths (vigintiles 18–20) and under-prediction of COVID-19-related hospitalisations, with the largest difference seen in the highest risk group.

thumbnail
Fig 1. Predicted and observed risk of COVID-19 related deaths.

https://doi.org/10.1371/journal.pone.0285979.g001

thumbnail
Fig 2. Predicted and observed risk of COVID-19 related hospital admissions.

https://doi.org/10.1371/journal.pone.0285979.g002

Table 3 presents the percentage of COVID-19 related deaths at different thresholds based on centiles of predicted absolute risk. 71.4% of deaths occurred in those in the top 5% for predicted absolute risk of COVID-19 related deaths which increases to 95.2% of deaths occurring in the top 30% for predicted absolute risk of COVID-19 related deaths.

thumbnail
Table 3. Sensitivity for COVID-19 related death at different QCOVID3 thresholds of absolute risk.

https://doi.org/10.1371/journal.pone.0285979.t003

Discussion

The results from this validation study demonstrate that the performance of the algorithms was good and yielded similar results to the original study in England [11]. In general, the risk algorithms showed high levels of discrimination (Harrell C statistic: ≥ 0.828 for both COVID-19 related deaths and hospital admissions) and good calibration. Improved precision in the Welsh data was shown in predicting risk of COVID-19 related death and hospitalisation in individuals who received one dose of the vaccine, and conversely lower precision was observed for risk of COVID-19 related death and hospitalisation in individuals who received two doses of the vaccine. Compared to individuals who received one vaccine dose, performance metrics were lower in both studies for those who received two doses of vaccine. The Welsh performance metrics yielded poorer results in comparison for those with two doses, but also there was a lower proportion of individuals who received two doses included, 55.6% compared to 75.7% in the English study. Individuals were followed up from 14 days after each dose of vaccine, therefore, anyone who received a first dose in the last two weeks of the study period could not be included or their second dose could not be included due to insufficient follow up time for calculating the outcomes. Additionally, as stated in the original study, there were small numbers for events occurring after second vaccination [11]. Therefore, the predictor variables for outcomes and developing the algorithm mostly came from individuals who only received one vaccine dose. This group of people are a different group from those who receive two vaccine doses. Those who received two doses early during the initial vaccination programme will have been prioritised due to occupational risk (healthcare workers) or higher risk of severe COVID-19 related outcomes [29].

We found higher observed to predicted risks for hospitalisation and lower observed to predicted risk of death in the highest risk groups in Wales (Figs 1 and 2). For COVID-19-related hospital admissions, both studies demonstrated similar trends of observed versus predicted risk, with observed admissions higher than predicted in the Welsh data in general compared to England. Overall, there was a slightly larger proportion of COVID-19 related hospital admission in the Welsh study (0.05%) compared to the English study (0.03%). Use of SAIL data allowed linkage using a demographic spine with follow-up across primary and secondary healthcare data and mortality data, and incorporates all COVID-19-related hospitalisations, including any in-hospital infections, as well as all emergency admissions within 14 days of a positive COVID-19 RT-PCR test result. This could explain the differences between the two studies and be a reason for increased observed risk in the highest risk group in Wales. Those at highest risk will have increased healthcare utilisation for any underlying conditions and were likely to partake in increased COVID-19 testing during the study period.

Proportionally, there were similar number of COVID-19 related deaths in the Welsh (0.02%) and English study (0.03%) and this is reflected in the similar calibration plots except for the highest risk group (group 20) where predicted deaths were higher than observed deaths for Wales. This could be attributed to the success of the vaccination programmes as well as protective and risk avoiding social interactions for those most at risk of serious COVID-19 outcomes. Whilst there were slight differences in the highest risk groups for risk of COVID-19 related deaths, the algorithms demonstrated 71.4% of deaths occurred in the top 5% for predicted absolute risk (Table 3) which was similar to the English study (78.7%) [11].

A recent systematic review of prediction models for severe manifestations and mortality due to COVID-19 identified 445 studies, of which 9 were rated to be low risk of bias with AUC’s ranging from 0.541 to 0.928 in populations from the UK, Ireland, Italy, Spain, Korea, US, and China [30]. The highest AUC was the creation of the original QCOVID algorithm which we had previously validated in Wales [3]. Our study focuses on using individual-level, population-wide COVID-19 risk prediction models for serious health outcomes in an adult vaccinated population, therefore, it is not possible to draw further comparisons with these earlier prediction models.

Some differences in prediction accuracy in independent populations are expected as there may be underlying differences in populations not captured by the variables included, imprecision due to relatively small numbers, and possibly differences in proportions of people treated with different modalities not captured in this study. A major strength of this study is the ability to utilise the SAIL Databank, a Trusted Research Environment, which enables population-wide, individual-level data linkage across healthcare systems to validate these pandemic predictive risk assessment algorithms. Results from this validation study in an independent population supports the findings of the QCOVID algorithm and likely to be relevant to countries with similar socio-economic conditions and health services. Understanding the demographic and clinical characteristics that are most at risk of serious health outcomes from current pandemics can be used for allocation planning for future threats and improve global equitable pandemic preparedness.

Whilst this independent study has demonstrated that the updated QCOVID algorithms fit the Welsh data well, the study includes some important limitations. As previously reported [3], the Welsh study was restricted to individuals registered to a SAIL providing general practice to derive the necessary predictor variables, therefore, results are based on 80% of the population (330/412 of all general practices in Wales). Due to SAIL’s information governance and disclosure control policies, we were unable to include information that is deemed too sensitive and therefore could not include HIV status. Some 41.6% of our cohort did not have a BMI recorded in the previous five years, therefore, missing observations were imputed. OPCS codes in hospital admissions data were used to define chemotherapy status with anyone with a record of receiving chemotherapy is assigned the coefficients for the middle severity chemotherapy group.

Also, this study replicates the original English study and so has similar stated limitations such as a relatively short follow-up, a partially vaccinated population including the Oxford-AstraZeneca or Pfizer-BioNTech COVID-19 vaccinations only, and small numbers of events in some subgroups. Consequently, it was not possible to calculate metrics by ethnic groups, or for narrowly defined age groups. Additionally, the study does not account for the interval between completion of the first and second vaccination, any changes that may have occurred in COVID-19 transmission rate within the study follow-up that might have impacted the prediction model temporally, or the different emerging variants during the study period [11]. Finally, whilst many risk factors for serious COVID-19 related outcomes have been included, additional risk factors such as occupational exposure to infection are not accounted for in this model.

Conclusion

This study presents an independent external validation of the updated QCOVID3 risk algorithms in the adult vaccinated Welsh population and has shown that the algorithms are valid for use in the Welsh population, and applicable on a population independent of the original study, which has not been previously reported. This study provides further evidence that the QCOVID3 algorithms can help inform public health risk management on the ongoing surveillance and intervention to manage COVID-19 related risks following vaccination. The outputs from the QCOVID algorithms can be used to support the prioritisation of vaccine boosters, invitation onto clinical trials, personalised interventions for prevention of patient care with both clinicians and patients being able to calculate their own risk through the online QCOVID calculator, and support allocation planning for possible future pandemics and improve global preparedness [31].

Supporting information

S1 Checklist. STROBE statement—checklist of items that should be included in reports of observational studies.

https://doi.org/10.1371/journal.pone.0285979.s001

(DOCX)

S1 Fig. Consort diagram of study participant inclusion.

https://doi.org/10.1371/journal.pone.0285979.s002

(TIF)

Acknowledgments

This study makes use of anonymised data held in the SAIL Databank. This work uses data provided by patients and collected by the NHS as part of their care and support and the Understanding Patient Data initiative. We would also like to acknowledge all data providers who make anonymised data available for research. We wish to acknowledge the collaborative partnership that enabled acquisition and access to the de-identified data, and sharing of necessary methodological documentation and scripts which led to this output. This is a collaboration between colleagues at University of Edinburgh, University College London, University of Oxford, Queen’s University Belfast, University of Leicester, Department of Health and Social Care, and Swansea University Health Data Research UK. Swansea University Health Data Research UK team is under the direction of the Welsh Government Technical Advisory Cell (TAC) and includes the following groups and organisations: the SAIL Databank, Administrative Data Research (ADR) Wales, Digital Health and Care Wales (DHCW), Public Health Wales, NHS Shared Services Partnership (NWSSP) and the Welsh Ambulance Service Trust (WAST). All research conducted has been completed under the permission and approval of the SAIL independent Information Governance Review Panel (IGRP) project number 0911.

References

  1. 1. Clift AK, Coupland CA, Keogh RH, Diaz-Ordaz K, Williamson E, Harrison EM, et al. Living risk prediction algorithm (QCOVID) for risk of hospital admission and mortality from coronavirus 19 in adults: National derivation and Validation cohort study. BMJ. 2020;371:m3731. pmid:33082154
  2. 2. Nafilyan V, Humberstone B, Mehta N, Diamond I, Coupland C, Lorenzi L, et al. An external validation of the QCOVID risk prediction algorithm for risk of mortality from COVID-19 in adults: a national validation cohort study in England. The Lancet Digital Health. 2021;3(7). pmid:34049834
  3. 3. Lyons J, Nafilyan V, Akbari A, Davies G, Griffiths R, Harrison E, et al. Validating the QCOVID risk prediction algorithm for risk of mortality from COVID-19 in the adult population in Wales, UK. International Journal of Population Data Science. 2022;5(4). pmid:35310465
  4. 4. Simpson CR, Robertson C, Kerr S, Shi T, Vasileiou E, Moore E, et al. External validation of the QCOVID risk prediction algorithm for risk of COVID-19 hospitalisation and mortality in adults: National validation cohort study in Scotland. Thorax. 2021;77(5):497–504. pmid:34782484
  5. 5. Coronavirus Shielded Patient List open data set, England [Internet]. NHS choices. NHS; [cited 2022Aug8]. Available from: https://digital.nhs.uk/dashboards/shielded-patient-list-open-data-set.
  6. 6. Bedston S, Akbari A, Jarvis CI, Lowthian E, Torabi F, North L, et al. Covid-19 vaccine uptake, effectiveness, and waning in 82,959 health care workers: A national prospective cohort study in Wales. Vaccine. 2022;40(8):1180–9. pmid:35042645
  7. 7. Tartof SY, Slezak JM, Fischer H, Hong V, Ackerson BK, Ranasinghe ON, et al. Effectiveness of mrna BNT162B2 COVID-19 vaccine up to 6 months in a large integrated health system in the USA: A retrospective cohort study. The Lancet. 2021;398(10309):1407–16. pmid:34619098
  8. 8. Polack FP, Thomas SJ, Kitchin N, Absalon J, Gurtman A, Lockhart S, et al. Safety and efficacy of the BNT162B2 mrna covid-19 vaccine. New England Journal of Medicine. 2020;383(27):2603–15. pmid:33301246
  9. 9. Perry M, Gravenor MB, Cottrell S, Bedston S, Roberts R, Williams C, et al. Covid-19 vaccine uptake and effectiveness in adults aged 50 years and older in Wales UK: A 1.2M population data-linkage cohort approach. Human Vaccines & Immunotherapeutics. 2022;18(1). pmid:35239462
  10. 10. Falsey AR, Sobieszczyk ME, Hirsch I, Sproule S, Robb ML, Corey L, et al. Phase 3 safety and efficacy of AZD1222 (ChAdOx1 nCoV-19) covid-19 vaccine. New England Journal of Medicine. 2021;385(25):2348–60. pmid:34587382
  11. 11. Hippisley-Cox J, Coupland CAC, Mehta N, Keogh RH, Diaz-Ordaz K, Khunti K, et al. Risk prediction of covid-19 related death and hospital admission in adults after covid-19 vaccination: National Prospective Cohort Study. BMJ. 2021;374:n2244 | pmid:34535466
  12. 12. Department of Health and Social Care. Interim Clinical Commissioning Policy: Casirivimab and imdevimab for patients hospitalised due to COVID-19, 2021. Available from: https://www.sehd.scot.nhs.uk/cmo/CEM_CMO(2021)017.pdf.
  13. 13. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: Systematic Review and Critical Appraisal. BMJ. 2020;:m1328. pmid:32265220
  14. 14. Bollyky TJ, Hulland EN, Barber RM, Collins JK, Kiernan S, Moses M, et al. Pandemic preparedness and covid-19: An exploratory analysis of infection and fatality rates, and contextual factors associated with preparedness in 177 countries, from Jan 1, 2020, to Sept 30, 2021. The Lancet. 2022;399(10334):1489–512. pmid:35120592
  15. 15. Kerr S, Robertson C, Nafilyan V, Lyons RA, Kee F, Cardwell CR, et al. Common protocol for validation of the QCOVID algorithm across the four UK nations. BMJ Open. 2022;12(6). pmid:35701053
  16. 16. von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE)statement: guidelines for reporting observational studies. PLoS Med. 2007 Oct 16;4(10):e296. pmid:17941714.
  17. 17. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162:55–63. pmid:25560714
  18. 18. Lyons RA, Jones KH, John G, Brooks CJ, Verplancke J-P, Ford DV, et al. The SAIL databank: Linking multiple health and social care datasets. BMC Medical Informatics and Decision Making. 2009;9(1). pmid:19149883
  19. 19. Ford DV, Jones KH, Verplancke J-P, Lyons RA, John G, Brown G, et al. The SAIL databank: Building a national architecture for e-health research and evaluation. BMC Health Services Research. 2009;9(1). pmid:19732426
  20. 20. Coronar (COVID-19) in the UK downloaded data 2021 [Internet]. UK Government; [cited 2022Aug8]. Available from: https://coronavirus.data.gov.uk/details/download.
  21. 21. HDRUK Innovation Gateway: Homepage [Internet]. HDRUK Innovation Gateway | Homepage. [cited 2022Aug8]. Available from: https://www.healthdatagateway.org/
  22. 22. Shielded Patient List [Internet]. Nhs choices. NHS; [cited 2022Aug8]. Available from: https://digital.nhs.uk/coronavirus/shielded-patient-list.
  23. 23. Who is at high risk from coronavirus (clinically extremely vulnerable) [Internet]. Nhs choices. NHS; [cited 2022Aug8]. Available from: https://www.nhs.uk/conditions/coronavirus-COVID-19/people-at-higher-risk/who-is-at-high-risk-from-coronavirus-clinically-extremely-vulnerable/.
  24. 24. UK data Service: Census data [Internet]. 2011 UK Townsend Deprivation Scores | UK Data Service | Census Data. 2017 [cited 2022Aug8]. Available from: https://statistics.ukdataservice.ac.uk/dataset/2011-uk-townsend-deprivation-scores.
  25. 25. Qcovid Risk assessment [Internet]. QCovid. [cited 2022Aug8]. Available from: https://QCOVID.org/Calculation.
  26. 26. Royston P. Explained Variation for Survival Models. The Stata Journal: Promoting communications on statistics and Stata. 2006;6(1):83–96.
  27. 27. Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine. 1996;15(4):361–87. pmid:8668867
  28. 28. Royston P, Sauerbrei W. A new measure of prognostic separation in survival data. Statistics in Medicine. 2004;23(5):723–48. pmid:14981672
  29. 29. Covid-19 vaccine roll-out begins in Wales [Internet]. GOV.WALES. [cited 2022Aug8]. Available from: https://gov.wales/covid-19-vaccine-roll-out-begins-wales.
  30. 30. Miller JL, Tada M, Goto M, Chen H, Dang E, Mohr NM, et al. Prediction models for severe manifestations and mortality due to covid ‐19: A systematic review. Academic Emergency Medicine. 2022;29(2):206–16. pmid:35064988
  31. 31. Welcome to The QCOVID® risk calculator [Internet]. University of Oxford. [cited 2022Aug8]. Available from: https://QCOVID.org/.