Introduction

Dementia is a major, worldwide health challenge predominantly affecting older people. It has an estimate global prevalence of 45 million people [1]. Pain is frequently reported in older people with approximately 20–50% living with chronic pain [2]. Managing pain can be difficult. There are challenges surrounding adherence and adoption of interventions such as exercise and medication taking. Detecting pain can also be difficult for people with dementia. Accordingly, pain in people with dementia is often under-detected and under-treated [3].

Self-reported pain scales such as numerical rating scales (NRS) are most frequently used to assess pain. For these patients, self-reported pain alone may not be sufficient [3]. Observed behavioural indicators of pain such as verbal complaints, sighing, moaning, agitation, crying, grimacing, rapid blinking, restlessness, rubbing, disorientation, or aggression may be valuable [4, 5].

Lichtner et al. [6] previously identified eight literature reviews reporting measurements and psychometric properties of tools assessing pain in people with dementia. No single tool was identified as more reliable and valid than others, with a wide variation in the reliability and validity. However, the search from the most recent review was performed in 2013. Furthermore, no studies have assessed the psychometric properties of outcome measures against the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist. This is a major limitation as the COSMIN checklist [7] is a robust assessment of both methodological quality of studies assessing measurement properties, with the quality of the outcome measure itself. Through this, the COSMIN checklist offers a robust, evidence-based recommendation on the quality of outcome measures selection in research and clinical practice [7].

The assessment of pain using a valid and accurate measurement is the basis for successful pain management [8]. However, there remains uncertainty on the appropriateness of these measures. Accordingly, the purpose of this systematic review was to determine the psychometric properties of the most frequently used pain measurement tools in research of people living with dementia.

Methods

This systematic review was conducted according to the COSMIN guidance [7] and reported in accordance with the PRISMA statement [9]. The study protocol was registered prior to commencing (PROSPERO registration: CRD42021282032).

Search strategy

Search 1: To identify the measurement tools currently used to measure pain in clinical trials of people living with dementia, we performed a search of the databases ClinicalTrial.gov and ISRCTN from inception to 01 October 2021. We used the search terms “Dementia OR cognitive impairment” AND “pain”.

Search 2: A systematic review was undertaken of published and unpublished sources to identify potentially eligible studies assessing the psychometric properties of pain measurement tools identified from Search 1. We searched the published databases: Medline, CINHAL, EMBASE, AMED, PsycINFO, and DARE from database inception to 01 November 2021. We also searched the trial registry and unpublished literature databases OpenGrey, Clinicaltrials.gov, and ISRCTN registries from inception to 01 November 2021. The search terms used for the EMBASE database are presented in Supplementary File 1. These were based on the COSMIN search filters to identify studies of psychometric properties linked to terms related to dementia, cognitive impairment, and pain. The search strategy was optimised for each electronic database search. The reference lists of all potentially eligibility studies were reviewed, and the corresponding authors from each included study were contacted and asked to review the search results.

Eligibility assessment

For both Search 1 and 2, studies were included if they recruited people, aged 60 years and older, with dementia. Dementia criteria such as the Diagnostic and Statistical Manual of Mental Disorders, Revised Fourth Edition (DSM IV) [10], National Institute of Neurological and Communicative Disorders and Stroke/Alzheimer’s Disease and Related Disorders Association (NINCDS/ADRDA) [11], and the National Institute for Neurological Disorders and Stroke-Association Internationale pour la Recherche et Enseignement en Neurosciences (NINCDS-AIREN) [12] were considered appropriate. Where self-reported dementia was reported, further scrutiny of the characteristics of the population in relation to severity of cognitive impairment, age, and comorbidities were considered. Where uncertain, corresponding authors were asked to verify the approach used to define dementia. All stages and severities of dementia were eligibility, i.e., mild, moderate, and severe. Whilst it is acknowledged that pain assessment tools have been developed for other, non-dementia, patient groups with cognitive impairment [13], these were excluded from this review unless there was sufficient evidence that participants presented with dementia.

We did not restrict the form, cause, or pathology causing pain. Through this, participant’s pain arise from musculoskeletal, post-surgery, medical, and cancer-related sources.

We included studies regardless of setting, i.e., acute, community, residential, or nursing home. We excluded studies not published in English, narrative, and systematic reviews, although reviewed the reference lists of these publications to identify any previously omitted studies.

For Search 2, we included all full-text publications which reported any assessment of the psychometric properties of measurement tools identified from Search 1. Papers which included findings on pain management were considered if they also provided data on the psychometric properties of a pain measurement tool. We only included studies which reported one or more of the COSMIN taxonomy of: internal consistency, test–retest reliability, measurement error, content validity, structural validity, construct validity/hypotheses testing, cross-cultural validity, criterion validity, or responsiveness [7].

Study identification

The search results were screened against the eligibility criteria by two reviewers (TS, KH). This was initially by title and abstract, and then by full-text version. Screening was performed by each reviewer independently. When consensus on study eligibility could not be reached, agreement was reached through discussion.

Data extraction

For each included study, data were extracted independently by one reviewer (TS). This was then verified for accuracy by a second reviewer (KH). Where disagreements occurred, these were resolved through discussion.

Data were extracted onto a bespoke data extraction table. Data extracted included: measurement tool name, setting tested, country of assessment, method of administration, person administered, duration between testing (if appropriate), patient participant characteristics (number and response rate), age, gender, diagnosis of pain, diagnosis of dementia, severity of dementia), and psychometric outcomes (reliability, validity, and responsiveness).

Risk of bias

To assess the methodological quality of the included studies, the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) checklist [14] was used. The COSMIN checklist assesses the following measurement properties: content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, hypotheses testing for construct validity, and responsiveness. The overall quality of how each measurement property was evaluated on a four-point scale: very good, adequate, doubtful, or inadequate, as per the COSMIN guidance. The methodological quality score per property was then obtained by taking the lowest rating of any item in each box—worst score counts principle. Two reviewers (TS, KH) assessed each study using this approach independently with disagreements resolved through consensus.

Data analysis

The psychometric properties of each measurement tool were reported narratively. Through this descriptive statistics, inferential statistics and degrees of variance were reported from included studies. Analysis was made following Chiarotto et al. [15] best evidence synthesis approach where ‘strong’ was a measurement tool which demonstrate consistent findings in multiple studies of good methodological quality OR in one study of excellent methodological quality; ‘moderate’ demonstrated consistent findings in multiple studies of fair methodological quality OR in one study of good methodological quality, ‘limited’ demonstrated on study of fair methodological quality, conflicting demonstrated conflicting findings and ‘unknown’ was only for studies of poor methodological quality or no studies reporting a measure.

Results

Search 1: identification of measurement tools

In total, 188 individual clinical trials were identified from Search 1. Of these, 56 were identified which reported measuring pain with participants living with dementia. A summary of these studies is presented in Table 1.

Table 1 Summary of trial registers which reported measuring pain in people with dementia

From the list generated from Search 1, we excluded all measures which did not specifically assess pain but included pain as a sub-domain of an instrument, e.g., SF-36, WOMAC, and EQ-5D. From this, seven outcomes were excluded (Comfort Assessment in Dying with Dementia, Edmonton Symptom Assessment Scale, EQ-5D, GLOBAL PROMIS-10, SF-36, Resident Assessment Index-Minimum Dataset, and Symptom Management—End of Life for Dementia). We excluded measurement tools which were not designed for people with cognitive impairment. Accordingly, three instruments were excluded (Brief Pain Inventory, McGill Pain Map, and WOMAC). Resultantly, the psychometric properties of nine measurement tools formed the basis of Search 2 (Abbey Pain Scale, ALGOPLUS, DOLOPLUS-2, Facial Action Coding System, MOBID-2, self-reported pain through the NRS or VAS/thermometer or Philadelphia Geriatric Pain Intensity Scale, PACSLAC/PACSLAC-2, Pain Assessment in Advanced Dementia (PAINAD), and Checklist for non-verbal pain behavior (CNPI) (Supplementary File 2).

Search 2: Psychometric tools analysis

A summary of the Search 2 results is presented in Fig. 1. In total, 1173 individual citations were identified. Fifty-one studies reported data on the psychometric properties of one or more of the nine measurement tools identified in Search 1. These studies were included in the analysis.

Fig. 1
figure 1

PRISMA flowchart reporting search results for Search 2

Characteristics of included studies and quality assessment

A summary of the characteristics of the included studies is presented in Table 2. In total, 5924 people with dementia were assessed. Mean age of population ranged from 72.5 years [16] to 87.9 years [17]. Thirteen studies were performed in a hospital setting [16, 18,19,20,21,22,23,24,25,26,27,28,29], 33 in care home facilities [17, 30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61] and two studies were based in both care home and people’s home settings [62, 63]. Two studies were performed both in care home and hospital settings [64, 65]. The location of study was not stated in Lorenzet et al. [66]. Studies were reported in 21 countries, most frequently Norway (n = 8) [32, 41, 48, 56,57,58,59, 63], USA (n = 7) [19, 33, 34, 42, 44, 60, 61], Canada (n = 4) [31, 52, 54, 55], and Brazil (n = 4)] 17,22,23,66].

Table 2 Summary of included studies

A summary of the findings from the COSMIN assessment is presented in Supplementary File 3. The results for the psychometric analysis are presented in Supplementary File 4. A summary of findings for the best evidence synthesis is presented as Table 3.

Table 3 Best evidence synthesis of outcome measures used to assess pain in people with dementia against the COSMIN risk of bias checklist rating and level of evidence for the measurement property

Abbey pain scale

Eight studies reported data on the psychometric properties of the Abbey Pain Scale [35,36,37,38,39,40, 43, 46]. Overall, there was limited evidence for the use of the Abbey Pain Scale (Table 3). There was inadequate evidence on PROM development, internal consistency (Cronbach: 0.65–0.74), cross-cultural validity, and responsiveness (p < 0.001). There was adequate evidence for the assessment of construct validity (R = 0.49–0.91) and very good evidence for reliability (inter-rater: 0.75–0.88; intra-rater: 0.66–0.68). The level of evidence for structural validity was doubtful (Cronbach: 0.76).

Pain assessment in advanced dementia (PAINAD)

Twelve studies assessed the PAINAD [16, 20,21,22,23,24, 40, 44,45,46,47, 65]. Overall, the level of evidence for the PAINAD tool was limited (Table 3). Whilst there was an adequate level of evidence for construct validity (R = 0.48–0.88), very good level of evidence for internal consistency (Cronbach alpha: 0.65–0.84), and reliability (intra-rater: 0.71–0.89; inter-rater: 0.79–0.94), there was inadequate evidence for cross-cultural validity and responsiveness (p < 0.001). There was doubtful level of evidence for structural validity (variance explained: 46.5–68.9%).

Facial action coding system

Five studies provided data on the facial action coding system [18, 27, 30, 31, 64]. These demonstrated moderate evidence for the use of this measurement tool (Table 3). There was adequate evidence for construct validity (R = 0.116–0.463), structural validity (p = 0.06 to p < 0.001), and reliability (inter-rater: 0.94).

Checklist for non-verbal pain behavior (CNPI)

Six studies presented data on the psychometric properties of the CNPI [19, 41,42,43,44, 55]. Overall, there was moderate evidence for the CNPI (Table 3). There was adequate evidence for construct validity (R = 0.46–0.88) and very good evidence of reliability (intra-rater: 0.23–0.65; inter-rater: 0.45–0.59). However, there was inadequate evidence for internal consistency (Cronbach alpha: 0.64–0.90).

Self-reported pain through verbal rating pain score

Ten studies assessed the psychometric properties of self-reported/verbal rating pain measures [27,28,29, 33,34,35, 42, 45, 51, 54]. Overall, there was limited evidence supporting the use of these tools (Table 3). Whilst there was adequate evidence on PROM development, construct validity (R = 0.30–0.95), and reliability (intra-rater: 0.71–0.84; inter-rater: 0.81–0.97), there was inadequate evidence on internal consistency (Cronbach: 0.74–0.84) and responsiveness (p = 0.03).

ALGOPLUS

One study, performed in a French hospital setting, presented data on the psychometric properties of the ALGOPLUS instrument [29]. This provided strong evidence for this tool (Table 3). Data reported very high construct validity (r2 = 0.81; p < 0.001), very high inter-rater reliability (0.812), and internal validity (KR-20: 0.712) and responsiveness to treatment (p < 0.001).

MOBID and MOBID-2

Four studies presented data on the psychometric properties of the MOBID [56, 58, 60, 61]. Overall, the MOBID instruments demonstrated moderate evidence (Table 3). If offered adequate evidence for PROM development and construct validity (R = 0.51–0.54 [60, 61]. Whilst the instrument demonstrated doubtful evidence for internal consistency, the values were high (Cronbach: 0.83–0.89), and it demonstrated adequate evidence for reliability (inter-rater: 0.86–0.97; intra-rater: 0.79–0.92).

Two studies reported data on the MOBID-2 [57, 59] instrument. It demonstrated moderate evidence for use (Table 3). There was adequate evidence for PROM development and construct validity (R = 0.61), and measurement error (Standard Error of Measurement (SEM): 1.4). Whilst there was inadequate evidence for the responsiveness, the minimally clinically important difference (MCID) was reported as three points and reported to be responsive to treatment (p < 0.001). There was very good evidence for the MOBID-2 for internal consistency (Cronbach: 0.82–0.84) and reliability (inter-rater: 0.94; intra-rater: 0.85–0.92).

PACSLAC and PACSLAC-II

Four studies assessed the PACSLAC-II [30, 31, 55, 62]. They suggested moderate evidence to support the use of this measurement tool (Table 3). There was very good evidence for internal consistency (Cronbach: 0.74–0.77), and reliability (inter-rater: 0.63–0.86) and adequate evidence for construct validity (R = 0.54–0.68). However, there was inadequate evidence for the assessment of responsiveness (p < 0.01).

The PACLAC was assessed in six studies [17, 40, 52,53,54, 66]. This demonstrated moderate evidence (Table 3). There was very good evidence for PROM development. There was adequate evidence for construct validity (R = 0.54–0.72), internal consistency (Cronbach alpha: 0.77–0.87), reliability (inter-rater: 0.52–0.96; intra-rater: 0.86), and responsiveness (p < 0.001). There was doubtful evidence for structural validity and cross-cultural validity.

DOLOPLUS-2

Thirteen studies assessed the psychometric properties of the DOLOPLUS-2 [25,26,27,28, 32, 44, 46, 48,49,50,51, 62, 63]. Overall, there was moderate evidence to support the use of this measurement tool. It demonstrated very good evidence for the assessment of internal consistency (Cronbach: 0.770–0.95) and reliability (intra-rater: 0.71; inter-rater: 0.35–0.86). There was adequate evidence for construct validity (R = 0.33–0.70), measurement error (SEM: ± 1.759), and cross-cultural validity. There was doubtful evidence for structural validity (explained variance: 36.9–76.1%) and inadequate evidence on responsiveness (p < 0.001).

Discussion

The findings indicate strong and moderate evidence to support the use of the facial action coding system, PACSLAC and PACSLAC-II, CNPI, DOLOPLUS-2, ALGOPLUS, MOBID, and MOBID-2 tools. There is limited evidence for the Abbey Pain Scale, self-reported pain measures, and the PAINAD tool.

The literature highlights the challenges of assessing pain with people living with dementia [3, 4, 67]. Challenges have included insufficient time to use measurement tools [68, 69], user’s uncertainty over the reliability of these [70], access to physically finding and using the measurement tools [71], and perceived superiority of observational methods of behaviors and physical manifestations of pain [70]. Whilst there is a bias to observational manifestation in a number of the supported measurement tools recommended, the time to complete and interpret these may act as a further barrier to adoption. Consideration of such potential challenges may be made when exploring the implementation of recommended measurement tools.

Under-treatment of pain in people with dementia has been attributed to challenges in recognition and assessment of pain, coupled with reservations on polypharmacy and side effects of analgesia [72]. Achterberg et al. [73] highlighted the frequently seen scenario where people with dementia are prescribed analgesics, but due to concerns around side effects, particularly regarding non-steroidal anti-inflammatory drugs, opioids, and adjunct analgesics, the medications are either not administer or are at a sufficient dosage to manage symptoms. This was clearly illustrated in Roitto et al.’s [74] survey where although 19% of their 327 cohort of people living in nursing homes with dementia were prescribed opioids, 79% were still in pain. Whilst this study has highlighted potentially robust pain measurement tools for this population, implementing both the assessment and subsequent treatment to improve pain management is required.

Pain assessment ideally considers several pain dimensions. These include: intensity, location, affect, cognition, behavior, and social accompaniments [72]. Measurement tools, most notably the DOLOPLUS-2, are multi-dimensional. Conversely, self-reported VAS/NRS of observation are unidimensional. However, it is acknowledged that assessment of some dimensions, notably pain cognition, can be more challenging due to communication and cognitive barriers. Focusing on single dimensions should be avoided to negate the risks of under-reporting/under-representing pain experienced by individuals.

Whilst reliability and construct validity were well explored, there remains limited evidence of the responsiveness, structural validity, and measurement error for many of the identified measures. This may be a reason for why pain measurement tools are poorly adopted into practice. Improving confidence around how measurement tools are used and interpreted may promote the implementation of such tools. Furthermore, as observational tools were most widely assessed, understanding the ‘normal’ or familiar behaviors of a person with dementia is important to recognise when something abnormal or noxious is being felt. No studies assessed the difference in reliability or validity when the assessment was performed by a healthcare professional versus a close relative or friend who may be more familiar with the individual. This may be an important area for future study, particularly when considering the adoption of pain assessment instruments in community and non-health or social care profession settings.

This systematic review presents with a number of strengths and limitations. A major strength is the adoption of the COSMIN evaluation. This approach ensured that the reader could be fully informed on the confidence with the recommendations made based on the evidence. Three important limitations should be considered. First, a comprehensive approach to reporting the psychometric properties of the most frequently used measurement instruments in research was adopted to aid prioritisation. However, this meant measurement tools used in clinical practice but not trials may have been omitted. Second, given the methods adopted through Search 1 to identify potential measurement tools, more recent tools such as the ePAT were not included in the analysis [39]. Consideration of this and inclusion of forthcoming evidence on psychometric properties should be made to update the findings as new evidence evolves in the field. Second, there was insufficient evidence to assess differences in recommendations based on severity of dementia. Evaluation on the impact of severity of cognitive impairment on the performance of the identified measurement tools would be warranted. Finally, there were challenges cause by poor reporting within included studies. There was insufficient detail within included studies to ascertain whether pain assessment instruments assessed acute or chronic pain, or whether individuals were taking analgesia or not. This may impact on the generalisability of the findings into practice and should be consider when reporting future studies in this area.

To conclude, there is strong and moderate evidence to support the use of the facial action coding system, PACSLAC and PACSLAC-II, CNPI, DOLOPLUS-2, ALGOPLUS, MOBID, and MOBID-2 tools for the assessment of pain with people living with dementia. Whilst these reflect measurement tools used in research, further consideration on how these reflect clinical practice, and lessons on how to implement these tools into practice should be considered to improve the detection and management of pain for people with dementia.