The most well-known and commonly used risk stratification tool for postoperative nausea and vomiting (PONV) in adults was developed by Apfel and colleagues in 1999.1 The Apfel risk score provides a simplified means of estimating patient risk of PONV using four risk factors: female sex, non-smoking status, history of PONV or motion sickness, and the use of postoperative opioids. Equal weighting is applied to each risk factor with each conferring an approximate 20% increased risk of PONV. The Apfel score has good discriminative power relative to other scores,2 and has been used to stratify risk and recommend prophylaxis in each version of the Consensus Guidelines for Managing Postoperative Nausea and Vomiting.3

The Apfel risk score was developed by cross-validating results from separate studies at two centres: Oulu, Finland4 and Wuerzberg, Germany.5 In the Finnish study, smoking was defined as “current daily smoker”, PONV was subdivided into “following general anesthesia” and “following regional anesthesia”, and motion sickness was subdivided into “in childhood only” and “also in adulthood”. In the German study, these risk factors were not defined or subdivided. Actual postoperative opioid use (as a dichotomous variable) was used in the development and cross-validation of the score; however, the intent was always to use the score for preoperative prediction.1

Despite the simplicity and widespread use of the Apfel score, uncertainty remains for researchers and clinicians about defining its component risk factors. The requirement for postoperative opioids, for example, is considered a binary risk factor, despite evidence of a dose-dependent effect of opioid administration on PONV development—e.g., a 2005 meta-analysis showed a reduction in the incidences of postoperative nausea (0.9%) and vomiting (0.3%) for each milligram reduction in postoperative morphine6,7—and the inherent complexity in applying a future postoperative outcome in the calculation of a preoperative risk score. Similarly, what constitutes a non-smoker, and how to combine a history of PONV and motion sickness into a composite variable, are not well defined.

Our aim was to conduct a literature review of the use of the Apfel score to determine how the four risk factors were defined and measured in individual studies. We hypothesized that a wide heterogeneity of definitions and applications would be used.

Methods

As this was a secondary analysis of published research, no institutional ethics approval was required. The protocol was developed and approved by the senior authors (J.D. and K.L.) before data extraction and analysis began. There were no subsequent amendments to the protocol.

A search of the Scopus database using the author search term “Apfel C.C., San Francisco and Wurzburg” was used to identify citations of the index Apfel score paper1 in papers published between 1 September 1999 (the month of the original study publication) and 1 September 2019 (a 20-year period). A health services librarian was involved to ensure that this search strategy identified all relevant citations.

Full-text reports of original research in humans published in English and available on-line using our institutional library access without requiring additional fees for access were assessed. Papers that were secondary analyses of published research, review articles or opinion pieces, and papers that were only available for full-text download for a fee, were excluded.

A data extraction template was developed and piloted by the senior authors. The final template was replicated in a REDCap database that was developed by the senior authors and supported by the University of Melbourne, Melbourne, Australia. After a study initiation meeting, each of the junior investigators (M.H., B.M., K.S., A.S.) were calibrated on ten data extractions that were assessed for completeness and accuracy by the senior authors.

Each included paper was assessed to determine whether sex, smoking status, history of PONV or motion sickness, and postoperative opioid use had been measured. Studies that measured all four risk factors were assessed further.

The following data were then extracted:

  1. 1.

    When the data were collected (preoperative, postoperative, unclear)

  2. 2.

    How the data were collected (by asking the patient, by reviewing the medical record, by other means, not stated)

  3. 3.

    How the risk factor was defined

    1. a.

      Sex (e.g., female, male, other sexes/genders)

    2. b.

      Smoking status (e.g., if a non-smoker had never smoked at all, never smoked regularly, or had ceased for a certain period before surgery)

    3. c.

      PONV and motion sickness (e.g., whether the number of events or duration of observation was specified)

    4. d.

      Postoperative opioid (e.g., anticipated administration or actual administration, and whether a drug, dose, route of administration, or duration of observation was specified), plus:

      1. i.

        Who made the decision about anticipated administration (anesthesia provider, hospital protocol, investigator, another person, not stated)

      2. ii.

        How actual administration data were collected (by asking the patient, by reviewing the medical record, by other means, not stated).

After data extraction was complete, the data were exported to an Excel spreadsheet for cleaning and analysis. Data were summarized using counts and percentages.

Results

One thousand and forty-nine citations of the index paper were identified in papers published between 1 September 1999 and 1 September 2019. After exclusions, 535 papers published in 197 different journals proceeded to data extraction, and 255 (48%) papers measuring all four risk factors were included in the final analysis. Calculated risk scores were reported in 116 (46%) of the 255 papers (Figure).

Figure
figure 1

Flow diagram

Definitions were provided for smoking in four (2%), PONV in zero (0%), motion sickness in one (0.4%), and postoperative opioid use in seven (3%) papers (Table 1). Postoperative opioid use was classified as “anticipated” in 138 (54%) studies and “actual” in 72 (18%) studies and was unclear in 45 (28%) studies.

Table 1 Definitions of risk factors in the Apfel simplified risk score*

The risk score was reported as an eligibility criterion in 53 (21%) papers and as a guide for protocolized antiemetic prophylaxis in 11 (4%) papers. In the remaining 193 (75%) papers, the risk score or its component risk factors were reported as baseline variables to describe the patient cohort.

Ninety-one (37%) papers were female-only studies, one (0.5%) was a male-only study, and 164 (64%) included females and males. No paper reported participants of other sexes or genders. Six papers (2%) reported on a history of PONV but did not mention motion sickness.

The timing and method of data collection are presented in Table 2.

Table 2 Time and method of data collection

Discussion

In this literature review of the Apfel risk score, we found that approximately one-half (255/535; 48%) of 535 citing studies measured all four component risk factors, with approximately one-quarter (116/535; 22%) reporting the calculated risk score. Risk factors were defined in 5% (12/255) of papers, with significant heterogeneity identified in the definition of postoperative opioid use, defined as originally intended (anticipated use) in only 54% (138/255) of studies. Few studies reported how the data for any of the four risk factors were collected.

A similar assessment of the Apfel risk score has not been undertaken before, despite long-standing concerns in defining its component risk factors.8 Eberhart and Morin in 2011 raised the issues of separating smokers from non-smokers, quantifying postoperative opioids, and classifying history of motion sickness.8 Our results may partly explain the imperfect prediction of PONV by the Apfel risk score, with areas under the receiver operator curve across multiple studies showing moderate accuracy only, with evidence grading of B1 applied to all risk factors apart from postoperative opioids (A1) in the most recent Consensus Guidelines of Gan et al.2,3,9,10 Apfel and colleagues, in their 2002 paper “How to study PONV”, identified principles of research that should be followed to improve the quality of PONV literature. As they identified, “heterogeneity between studies compromises meta-analysis” and “if a simplified risk score is used, the number of patients with the different risk levels should be given as well”.11 Nevertheless, less than half of the studies included in this review reported the calculated risk score. More recently, some have advocated for a more liberal (or “zero tolerance”) approach to PONV prophylaxis, regardless of patient risk profile.3,12,13 As the cost of routine antiemetic prophylaxis has been reduced over recent years, this strategy has gained increased interest; however, as stated in the Consensus Guidelines of Gan et al.: “The utility of this approach requires further validation, with particular focus on the incidence of antiemetic side effects”.3 Further work is required to assess the balance of cost, side effects, and efficacy of liberal antiemetic prophylaxis compared with more robust implementation of existing proven PONV risk stratification tools.

A major finding of our study was the wide heterogeneity in the definitions of postoperative opioid risk factor. There are significant challenges in anticipating opioid requirements for patients prior to surgery, with a bias towards overprescribing,14 and significant variability in individual anesthesiologist opioid prescribing patterns, which are not explained by patient or case-specific factors.15 Coupled with clear evidence of a dose-response relationship between opioid administration and PONV,6 this identifies major challenges in how to use this risk factor to calculate PONV risk.

There is little guidance, either in the Consensus Guidelines of Gan et al.,3 or in the original work of Apfel et al.,1 on how to best implement this risk factor. For example, does any administration of any postoperative opioid apply, or is a specific opioid dose required to achieve this risk factor? It is important to note that the association of postoperative opioids with PONV risk was significant in the Oulu but not in the Wuerzberg validation cohorts.1 This was attributed to differences in postoperative opioid administration between cohorts (80% vs 10% of patients in each population, respectively) with differences also in dose administered—oxycodone 20 mg (Oulu) vs tramadol 100 mg (Wuerzberg). Our study suggests that individual researchers and anesthesia providers are defining “postoperative opioids” with significant variability in the absence of such guidance. Future research should seek to provide more details on this risk factor—e.g., investigating whether a threshold opioid-equivalent dose confers an unacceptably high additional risk of PONV. Alternatively, a more accurate (but necessarily more complex) risk score could incorporate increased granularity, moving from a binary variable to more nuanced risk-scoring depending on opioid-equivalent dose. Nevertheless, uptake of such a score may be challenging because of increased complexity.

Motion sickness was reported in 98% of identified studies. It is interesting to examine the development of this risk factor. In the original Oulu derivation cohort,4 24% of patients had experienced childhood motion sickness, but in only 10% patients did this continue to adulthood. Nonetheless, any history of motion sickness was considered to meet the threshold for this risk factor, potentially misclassifying more than half of included patients who no longer suffered from motion sickness as adults. Our study revealed that this definition persisted in the literature. No future studies went on to categorize whether childhood (but resolved) motion sickness was considered differently to motion sickness in adulthood. Furthermore, since the 1999 Apfel study, it has been conventional to group the two variables of history of PONV and motion sickness together.1 We identified only two subsequent studies that considered motion sickness separately2,9; except for six studies solely reporting a past history of PONV, they appear to be considered as a composite variable. This bears closer examination. The original Oulu cohort analyzed these as separate variables, finding that coefficients for motion sickness were significantly lower than for a history of PONV in the two logistic regression models developed, thus conferring less risk.4 In the Wuerzberg cohort and in the combined validation model, these were then grouped together, with the developed simplified risk score then applying equal weighting to the four risk factors. Among the two subsequent studies that have considered motion sickness separately, Eberhart and colleagues also showed a lower odds ratio for motion sickness vs history of PONV, while Apfel and colleagues found similar risk profiles.2 In summary, further research is required to understand to what degree patients with solely motion sickness (and perhaps only in childhood), are positively classified as achieving this Apfel risk factor, potentially overestimating their risk profile.

A strength of our paper was our comprehensive literature review, identifying all studies citing the validation study that first defined the Apfel criteria. It is possible, however, that some studies were missed that did not specifically cite this paper, although we consider this risk low and unlikely to have materially biased our results. A further limitation is that we only considered English language publications and did not examine pay-for-view papers (although this exclusion criterion only excluded seven papers). Finally, it is possible that researchers defined each item in their research protocols but did not report these definitions in their papers.

This study shows significant heterogeneity in measuring and applying the Apfel score in the research setting. The absence of definitions of component variables in most included studies, coupled with one-quarter of studies using actual postoperative opioid use (rather than anticipated), shows the requirement for greater guidance around implementation. Although it is not possible from this data to infer whether inaccurate use of the Apfel score is common in clinical practice, it is likely that outside of the controlled conditions of research studies that even greater variation in its application exists.

Although the Apfel risk score has revolutionized and democratized PONV risk stratification for over two decades—significantly improving perioperative care of this most-common side effect of anesthesia,—there are challenges in how its component risk factors are defined and applied to individual patient risk stratification. Greater rigor should be applied in defining these factors, which should not detract from its major strength—i.e., simplicity and ease of implementation. More guidance from consensus groups could better inform anesthesia providers in identifying high-risk patients, thus helping the use of this landmark tool to optimize and standardize PONV prophylaxis.