Elsevier

Intelligence

Volume 32, Issue 4, July–August 2004, Pages 363-389
Intelligence

Measurement invariance of core cognitive abilities in heterogeneous neurological and community samples

https://doi.org/10.1016/j.intell.2004.05.002Get rights and content

Abstract

Confirmatory factor analysis of Australian adaptations of combined Wechsler Adult Intelligence Scale-Revised (WAIS-R) and Wechsler Memory Scale-Revised (WMS-R) scores was conducted in a sample of 277 participants undergoing investigation for neurological disorders. The best-fitting model was a six-factor model representing the latent abilities of Verbal Comprehension, Perceptual Organization, Working Memory, Verbal Memory, Visual Memory, and Processing Speed. Invariance of the measurement model was then examined in the mean and covariance structure with data from a recent Australian normative study of the WAIS-R and WMS-R. [Carstairs J.R. & Shores E.A.Aust Psychol 35 (2000) 36-40]. Results suggest that the measurement model underlying test scores displayed “strong” metric invariance [Widaman, K. F., & Reise, S. P. (1997). Exploring the measurement invariance of psychological instruments: Applications in the substance abuse domain. In K. J. Bryant & M. Windle (Eds.), The science of prevention: Methodological advance from alcohol and substance abuse research (pp. 281–324). Washington, DC: American Psychological Association] across clinical and community samples. These findings satisfy assumptions necessary for uncomplicated interpretation of validity correlations and differences in test scores across groups.

Introduction

Debate regarding the construct validity of measures of material-specific amnesia revolves around the results of factor analytic studies of clinical memory batteries such as versions of the Wechsler Memory Scale (WMS; Heilbronner, 1992, Lee et al., 1989, Psychological Corporation, 2002, Smith et al., 1992). Material-specific memory disorders are forms of disability involving reduced long-term retrieval or anterograde memory function prominently affecting acquisition and retention of either auditory-verbal or visuospatial memory Bell & Davies, 1998, Heilbronner, 1992, Lee et al., 1989. The results of factor analytic studies of the WMS-R and WMS-III have produced varied results, most studies reporting one of two alternate three-factor structures, either distinguishing Working Memory from Verbal Memory and Visual Memory or distinguishing Working Memory from Immediate Memory and Delayed Memory (see Bowden et al., 1997, Jurden et al., 1996, Psychological Corporation, 2002, Woodard, 1993). For consistency with the third editions of the Wechsler scales, we will use the term Working Memory as synonymous with the earlier terms of Attention-Concentration and Freedom from Distractibility (Psychological Corporation, 2002). Studies of the factor structure of the Wechsler Adult Intelligence Scale-Revised (WAIS-R) have consistently identified traits labeled Verbal Comprehension, Perceptual Organization, and Working Memory (see Flanagan et al., 1997, Kaufman & Lichtenberger, 2002, Psychological Corporation, 2002).

Recently, Bowden, Carstairs, and Shores (1999) reported a five-factor solution from combined WAIS-R and WMS-R subtest raw scores in a large sample of healthy community volunteers. This factor model comprised abilities corresponding to Verbal Comprehension, Perceptual Organization, Attention-Concentration, Verbal Memory, and Visual Memory. This five-factor solution was replicated by Bowden et al. (2001) and provides clear theoretical continuity with the factor analysis of the joint WAIS-III and WMS-III, with the addition of a Processing Speed factor augmented by an additional subtest (Tulsky & Price, 2003).

Now that several studies of the joint factor structure of the Wechsler Intelligence and Memory Scales for adults have been reported, the identification of distinct Verbal Memory and Visual Memory factors, in addition to other core cognitive abilities, is reliably established. Identification of memory disability is of importance in many clinical settings where diagnostic and treatment decisions may hinge on the distinction between auditory-verbal and visuospatial memory abilities. One such setting involves the evaluation of patients with neurological conditions including, for example, presurgical and postsurgical evaluation of people undergoing temporal lobectomy for treatment of refractory epilepsy Baxendale, 1995, Bell & Davies, 1998, Jones-Gotman et al., 1993. However, in light of the inconsistent findings regarding the factor structure of clinical memory batteries in people with neurological conditions, particularly those diagnosed with seizure disorders, it has been suggested that the construct validity of published clinical memory batteries is inadequate (see Bell & Davies, 1998, Lee et al., 1989, Psychological Corporation, 2002). Therefore, we were interested to examine the generalizability of the factor model of joint Wechsler scales in a heterogeneous sample of people with neurological conditions.

In terms of nomothetic span (Embretson, 1983), versions of the Wechsler Intelligence and Memory Scales have provided essentially the same coverage across successive generations, including the third editions Flanagan et al., 1997, Kaufman & Lichtenberger, 2002, Psychological Corporation, 2002. Therefore, it should be possible to provide a comprehensive test of competing models of ability using the revised versions (see Bowden et al., 1999, Psychological Corporation, 2002), particularly because we have access to a representative community sample of adults. Further, we wanted to evaluate the invariance of the measurement model across clinical and community samples. Although the concept of invariance of measurement is fundamental to fairness in testing, methods for evaluating measurement invariance are not well known and have only become relatively accessible with recent development of covariance structure modeling (see Byrne et al., 1989, Meredith, 1993, Reise et al., 1993, Vandenberg & Lance, 2000).

Widaman and Reise (1997) elaborated a method for evaluating the invariance of a measurement model that combines comparison of the pattern and numerical value of factor loadings, error variances, and latent factor structure with evaluation of the equivalence of manifest (observed) variable intercepts and latent factor means. This method allows examination of the comparability of a full measurement model across groups. As Meredith (1993) has shown, measurement invariance is of particular importance to establishing fairness in classification decisions across diverse groups, because bias may arise from several sources even assuming the same number of factors and the same configuration of factor loadings. Apparent group differences may arise from systematic differences in observed score means, although underlying latent means may not differ across groups. In addition, differences in the observed score variances or reliabilities may cause apparent group differences. Because each of these components of a measurement model may be examined for invariance, it is of practical importance to establish invariance of the measurement model implicit in the kind of ability batteries that are routinely used for clinical classification.

Drawing on the work of Meredith (1993), Widaman and Reise (1997) describe four levels of measurement invariance beginning with the concept of configural invariance where only the number of factors and pattern of factor loadings is identical across groups. If, in addition, the numerical values of the factor loadings are identical across groups, then weak metric invariance is established. Strong metric invariance is demonstrated in equivalence of manifest variable intercepts and, finally, strict metric invariance is evident if residual (observed score) variances are equivalent across groups.

In the full measurement model, equality or invariance of latent variable means, variances, and covariances may also by examined in the α and ψ matrices, respectively. However, invariance of these latter components of the measurement model are not necessary for the establishment of metric invariance but rather are the substantive focus of theoretical interest, once metric invariance has been established (Widaman & Reise, 1997). Differences in the variances or covariances between latent factors and differences in factor means may be observed due to real differences in level or relationship between the underlying abilities across samples. In a recent review of measurement invariance research, Vandenberg and Lance (2000) recommend an essentially identical algorithm for testing measurement invariance and structural invariance. Their definition of measurement invariance is equivalent to Widaman and Reise's definition of the three levels of metric invariance, outlined above, and their definition of structural invariance similarly involves examination of equality of factor means, variances, and covariances.

To summarize, configural invariance is the minimum condition necessary to enable comparison of factor scores across groups. “Configural invariance is consistent with the presumption that similar, but not identical, latent variables are present” (Widaman & Reise, 1997, p. 292). However, uncomplicated interpretation of factor means and factor correlations across groups is dependent on the assumption of strong metric invariance (Vandenberg & Lance, 2000). “If strong factorial invariance holds, group differences in both means and variances on the latent variables, which represent the constructs in psychological theories, are reflected in group differences in means and variances on the measured variables” (Widaman & Reise, 1997, p. 295). In general, interpretation of clinical effects, for example, effects of treatment or diagnosis, is based on interpretation of patterns of mean scores. In addition, interpretation of convergent and divergent validity relationships is usually based on interpretation of observed correlations. Therefore, straightforward interpretation of clinical validity and theoretical relationships between constructs requires the demonstration of strong metric invariance Vandenberg & Lance, 2000, Widaman & Reise, 1997. “For test scores to be comparable across ostensibly distinct examinee populations, the observed test items, or indictors, must have identical, or invariant, quantitative relationships with the latent variable for each population of interest.” (Widaman & Reise, 1997, p. 282).

Millis, Malina, Bowers, and Ricker (1999) have commented on the lack of data regarding measurement invariance in clinical assessment of cognitive abilities, a view confirmed by Vandenberg and Lance's (2000) recent review. Yet, many approaches to assessment are based on implicit assumptions about measurement invariance, for example, the old hypothesis that core cognitive abilities are expressed differently in diagnostic groups as opposed to normative groups (see Lezak, 1988, Matarazzo, 1990). The motivation underlying the development of most neuropsychological batteries assumes that brain disease affects the expression or structure of cognitive abilities; hence, it is assumed that special tests are required to assess the cognitive effects of neuropsychological conditions, a view highlighted, for example, by interest in assessment of “executive” function Lezak, 1995, Stuss & Alexander, 2000. The “executive” hypothesis involves untested assumptions about measurement invariance or structural invariance. The controversy around the validity of material-specific memory tests, noted above, implicitly involves questions about measurement invariance, because it has been suggested that clinical memory tests do not measure the same abilities in normative as opposed to clinical groups (for reviews, see Bell & Davies, 1998, Jones-Gotman et al., 1993, Lee et al., 1989).

Conversely, implicit but untested assumptions regarding the presence of measurement invariance are similarly common. For example, the most recent editions of the Wechsler scales include a variety of recommendations for diagnostic interpretation of ability difference scores based on data derived from the standardization sample (Psychological Corporation, 2002). Increasing utilization of criteria such as the profile variability index (McLean, Reynolds, & Kaufman, 1990) or base rates of subtest differences (Psychological Corporation, 2002) presumes a degree of measurement invariance that remains essentially untested. However, as shown above, the relationship between observed scores and latent factor scores across groups, including the covariances between factors, is complicated by the need to establish metric invariance. Therefore, in the absence of the demonstration of metric invariance, diagnostic interpretation of ability scores may be misleading.

We report an evaluation of measurement invariance in a representative sample of patients undergoing evaluation for neurological conditions in a tertiary neurosciences facility. Almost two thirds of these participants were undergoing evaluation for epilepsy surgery. Results from this clinical sample were compared with a representative community sample of healthy young adults (Carstairs & Shores, 2000). We report scores for the revised versions of the Wechsler scales because we have raw score data from a recent Australian normative study of these tests. To test the full set of assumptions underlying measurement invariance, it is necessary to base the analysis on raw scores. This is because any rescaling of scores in terms of within-group means and variances may obscure between-group differences in the scaling of scores, which are integral to examination of the regressions relating observed scores to latent variables Meredith, 1993, Vandenberg & Lance, 2000, Widaman & Reise, 1997. However, representative data from community samples are extremely expensive to obtain and are usually only collected by the test publishers, because few grant agencies are willing to fund the collection of large normative data sets. In the absence of data derived from a representative community sample, examination of metric invariance would be difficult to interpret in terms of general implications for construct validity. We are not aware of any other reports of measurement invariance of the combined Wechsler scales in any edition, with the exception of our recent study including a sample of patients with alcohol dependence (Bowden et al., 2001). However, in this latter study, we did not examine invariance of the full measurement model, only the covariance structure (cf. Jurden et al., 1996).

With regard to the current sample of patients with heterogeneous neurological conditions, including patients with seizure disorders, representative studies commonly report ability scores ∼1 S.D. below the relevant age group means (e.g., Bell & Davies, 1998, Seidenberg et al., 1998). Therefore, on the assumption that we will observe some degree of metric invariance across the clinical and community samples, we would expect differences in structural parameters, that is, between the samples in terms of latent factor means and perhaps factor variances or covariances also.

Section snippets

Sample 1

Participants (n=277, 127 female and 150 male) were recruited from the Department of Clinical Neurosciences and the Victorian Epilepsy Centre at St. Vincent's Hospital, Melbourne, Victoria, Australia. The sample comprised consecutive referrals for neuropsychological assessment for whom full test data were available. Diagnostic status in this sample comprised 177 (64%) patients with seizure disorders, 21 (8%) patients with neurotrauma, 18 (7%) patients with vascular disease, 8 (3%) patients with

Confirmatory factor analysis

Goodness-of-fit statistics for the alternative models examined in the neurological sample are reported in Table 4. Both Models 5a and 5b provided significant improvement in fit, in terms of the change in χ2, when compared with Model 4. Compared with each other, Models 5a and 5b produced very similar χ2 values and practical goodness-of-fit statistics (Table 4). Models 6a and 6b provided similar goodness-of-fit statistics to each other and significant improvements in fit, in terms of the change

Discussion

In the present study, we initially examined the factor structure of the combined WAIS-R and WMS-R subtests. A six-factor model provided the best overall fit to the data and a clearer articulation of latent factors in the sample of participants with heterogeneous neurological disorders than was the case for the other models examined. Model 6ph, in addition to reflecting core cognitive abilities, permitted the identification of auditory-verbal and visuospatial long-term memory factors and

Acknowledgements

This research was supported by grants from the Australian Research Council and the Epilepsy Foundation of Victoria to Stephen C. Bowden and Mark J. Cook and by grant 95/128 from the New South Wales Motor Accident Authority to E. Arthur Shores and Jane R. Carstairs. We gratefully acknowledge the thoughtful contributions of three reviewers. We also acknowledge the assistance of Drs. Lina Forlano, Karen Sullivan, and Catherine O'Brien and the graduate psychology students in the collection of data

References (48)

  • S.A. Baxendale

    The hippocampus: Functional and structural correlations

    Seizure

    (1995)
  • P.J. Legree et al.

    Correlations among cognitive abilities are lower for higher ability groups

    Intelligence

    (Jul-Aug 1996)
  • B.D. Bell et al.

    Anterior temporal lobectomy, hippocampal sclerosis, and memory: Recent neuropsychological findings

    Neuropsychology Review

    (1998)
  • S.C. Bowden et al.

    The relative usefulness of the WMS and WMS-R

    Journal of Clinical and Experimental Neuropsychology

    (1992)
  • S.C. Bowden et al.

    Confirmatory factor analysis of combined Wechsler Adult Intelligence Scale-Revised and Wechsler Memory Scale-Revised scores in a healthy community sample

    Psychological Assessment

    (1999)
  • S.C. Bowden et al.

    Confirmatory factor analysis of the Wechsler Memory Scale-Revised in a sample of alcohol dependent clients

    Journal of Clinical and Experimental Neuropsychology

    (1997)
  • S.C. Bowden et al.

    Factorial invariance for combined WAIS-R and WMS-R scores in a sample a patients with alcohol dependency

    The Clinical Neuropsychologist

    (2001)
  • M.W. Browne et al.

    Alternative ways of assessing model fit

  • N. Butters et al.

    Differentiation of amnesic and demented patients with the Wechsler Memory Scale-Revised

    The Clinical Neuropsychologist

    (1988)
  • B.M. Byrne

    Structural equation modelling with LISREL, PRELIS, and SIMPLIS: Basic concepts, applications, and programming

    (1998)
  • B.M. Byrne et al.

    Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance

    Psychological Bulletin

    (1989)
  • J.R. Carstairs et al.

    The Macquarie University Neuropsychological Normative Study (MUNNS): Rationale and methodology

    Australian Psychologist

    (2000)
  • J.P. Chapman et al.

    Reliability and the discrimination of normal and pathological groups

    Journal of Nervous and Mental Disease

    (1983)
  • J. Cohen

    Statistical power analysis for the behavioural sciences

    (1988)
  • M. de Lemos

    WAIS-R Australian supplement

    (1981)
  • Dudgeon, P. L. (2001). RMSEA/ECVI calculation software. Available at:...
  • S. Embretson

    Construct validity: Construct representation versus nomothetic span

    Psychological Bulletin

    (1983)
  • R.L. Heilbronner

    The search for a “pure” visual memory test: Pursuit of perfection

    The Clinical Neuropsychologist

    (1992)
  • L. Hu et al.

    Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification

    Psychological Methods

    (1998)
  • M. Jones-Gotman et al.

    Neuropsychological testing for localizing and lateralizing the epileptogenic region

  • F.H. Jurden et al.

    Factorial equivalence of the Wechsler Memory Scale Revised across standardization and clinical samples

    Applied Neuropsychology

    (1996)
  • A.S. Kaufman et al.

    Assessing adolescent and adult intelligence

    (2002)
  • G.J. Larrabee et al.

    Factor analysis of the WAIS and Wechsler Memory Scale: An analysis of the construct validity of the Wechsler Memory Scale

    Journal of Clinical Neuropsychology

    (1983)
  • Cited by (28)

    • A network psychometric approach to neurocognition in early Alzheimers's disease

      2021, Cortex
      Citation Excerpt :

      Statistically, these descriptions invoke the unique variance between variables as well as shared variance signifying domains (Costantini et al., 2015; Epskamp & Fried, 2018). The latent-variable approach, with an emphasis on common variance and measurement invariance of neurocognitive domains (Bowden, Cook, Bardenhagen, Shores, & Carstairs, 2004; Meredith, 1993; Strauss & Smith, 2009), is not best placed to model such between–variable relationships across the early AD (eAD) continuum. However, it is important that these relationships are calculated as robustly and described as precisely as possible.

    • Specificity of psychopathology in temporal lobe epilepsy

      2013, Epilepsy and Behavior
      Citation Excerpt :

      The study sample included 209 consecutive patients referred to St. Vincent's Neuropsychology Unit who had undergone assessment with the MMPI-2 and neuropsychological evaluations. The referrals to the Clinical Neuropsychology Unit were representative of many outpatient neuropsychological services within Australia, receiving referrals from physicians and surgeons employed by the hospital [35,36]. The service has a close association with the Victorian Epilepsy Centre through St. Vincent's Clinical Neurosciences Department.

    View all citing articles on Scopus
    View full text