The effects of self-assessed health: Dealing with and understanding misclassification bias

https://doi.org/10.1016/j.jhealeco.2021.102463Get rights and content

Abstract

Self-assessed health (SAH) is often used in health econometric models as the key explanatory variable or as a control variable. However, there is evidence questioning its test-retest reliability, with up to 30% of individuals changing their response. Building on recent advances in the econometrics of misclassification, we develop a way to consistently estimate and account for misclassification in reported SAH by using data from a large representative longitudinal survey where SAH was elicited twice. From this we gain new insights into the nature of SAH misclassification and its potential for biasing health econometric estimates. The results from applying our approach to nonlinear models of long-term mortality and chronic morbidities reveal that there is substantial heterogeneity in misclassification patterns. We find that adjusting for misclassification is important for estimating the impact of SAH. For other explanatory variables of interest, we find significant but generally small changes to their estimates when SAH misclassification is ignored.

Introduction

Self-assessed health (SAH) is a ubiquitous measure in the health economics literature and, more broadly, in social science research (Au and Johnston, 2014). It is often asked as a simple question, “in general, how would you rate your health?”, where respondents select from categories such as excellent, very good, good, fair or poor. SAH is used variously in econometric models as the outcome variable, as the key explanatory variable or as a control variable to prevent health from confounding the effect of interest. However, there is a large literature calling into question the reliability of reported SAH, as up to 30% of individuals change their response when re-asked about their SAH (Crossley and Kennedy, 2002, Clarke and Ryan, 2006, Black et al., 2017a). This paper takes advantage of recent econometric developments on misclassification and information from a prominent longitudinal survey—the Household, Income and Labour Dynamics in Australia (HILDA) survey—to gain new insights into the nature of misclassification in reported SAH and its potential for biasing estimates of the effects of SAH and other explanatory variables in health econometrics models. In particular, our analysis uses data from the 2001 wave of HILDA, which records the same individual's SAH responses in two different but similar questionnaires in the same wave (face-to-face or over-the-phone interview, and on a self-completion questionnaire), and combines this information with longitudinal data on mortality and the development of chronic health conditions 15 years later. We develop a new likelihood-based nonlinear estimator which uses this information to jointly estimate the misclassification in both reported SAH measures as well as the effects of SAH on mortality and morbidity.

Two independent misclassified measures of a categorical variable, such as SAH, supplemented with data on an outcome, such as mortality, can identify all the misclassification probabilities and the effect of the variable on the outcome (Hu, 2008, Hu, 2017). This can be done without imposing virtually any restrictions on the misclassification patterns, such as assuming that the probabilities of certain forms of misclassification are zero, that certain misclassification probabilities are larger than others, or that the misclassification probabilities are the same for both SAH measures. In our case, the flexibility of allowing that each measure may have differing levels of misclassification is important because the mode with which the question was asked is different.1 While infinitely many misclassification patterns are compatible with the observed data on only two reported SAH measures,2 adding information about an outcome affected by SAH such as mortality allows us to pin down the misclassification probabilities. The reason is that each possible misclassification pattern implies a unique distribution of SAH within each reporting group, so that the average outcome within each group provides the missing information needed to reveal the misclassification pattern present in the reported SAH data. For instance, consider (a) the group of individuals reporting “excellent” health according to the first measure and “very good” according to the second, versus (b) the group responding “excellent” in both. If the individuals in both groups are mainly in “excellent” health, then they should have similar mortality. But if the individuals in (a) are mainly in “very good” health whereas those in (b) are mainly in “excellent” health, then the mortality of group (a) is likely to be different to the mortality of (b). Thus, looking at these three variables jointly (the two reported SAH plus mortality) can identify all underlying misclassification probabilities. And, conversely, because identifying misclassification is tantamount to knowing the underlying distribution of SAH within each group, it also makes it possible to back out the true impact of SAH on the outcome. In the next section (Section 2), we present a more detailed example of this identification strategy, and in Section 3 we show how this formally generalises to a full econometric model which can be richly parametrised in terms of covariates. However, estimating such a model is not straightforward.

While Hu (2008) discusses a nonparametric estimator for this setting, the implementation of that estimator is non-trivial and its computation is prohibitive when, as in our case, the model has many covariates, the potentially misclassified variable (SAH) has many categories, and the sample size is large.3 Therefore, we develop a more easily implementable parametric likelihood-based estimator. An important advantage of our estimator is that the effects of categorical SAH are specified by including dummy variables for each category of SAH in the outcome model, the standard way SAH is included as a categorical regressors in the health economics literature. Our approach also lends itself easily to specifications with interaction effects where the impact of unobserved SAH differs depending on other individual characteristics. Such specifications, common in applied work to investigate the heterogeneity of the effect of SAH, have received little attention in the misclassification literature so far. Another advantage is that, because the model has a finite-mixture representation, our estimation approach is a flexible parametric specification estimated via a standard expectation-maximisation (EM) algorithm, which offers fast and reliable computation. The flexibility and richness of our model, where we allow unrestricted patterns of misclassification that depend on all covariates, means that the likelihood is complex and difficult to maximise. The EM algorithm provides the key to a simpler and more direct path to the solution. By holding misclassification constant in the maximisation step, the resulting log likelihood is substantially simpler: it becomes additively separable, so that components can quickly be maximised separately. Moreover, because it is likelihood-based, our estimator can be easily adapted to encompass several outcomes jointly (such as, in our case, mortality and chronic morbidity) and be further extended to consider the pensalisation of misclassification parameters to improve stability and efficiency. Section 4 provides simulation evidence on our estimators finite sample performance.

The focus of this paper is the application of our proposed estimator to the HILDA data with the aim of making two key contributions to the health economics literature. First, we go beyond the current literature which only documents observed differences in multiple reported measures of SAH, typically by regressing an indicator of conflicting SAH answers on a set of explanatory variables (Black et al., 2017a). As an example of the difficulties associated with interpreting some of the estimates produced with this method, consider for instance the finding that individuals with lower education are more prone to giving conflicting SAH answers when asked twice. It is generally not possible to conclude from such a finding which of the two reported SAH questions is answered more accurately, and which types of specific mistakes are made with which frequency. It is not even possible to conclude that individuals with lower education tend to have generally higher rates of misclassification than higher education individuals, since it could be, for instance, that face-to-face SAH from low education individuals is much less reliable, while self-completed SAH from low education individuals is somewhat more reliable. In contrast to these reduced-form approaches, our new framework provides estimates of the complete set of probabilities of misreporting each category of each measure by covariates such as education, income, etc. By linking differences in SAH to underlying misclassification probabilities, it makes it possible to address behavioural questions about the extent, patterns and heterogeneity of individuals’ responses. It also makes it possible to assess questions pertaining to survey methodology, such as the type and incidence of response errors associated with each of the two survey instruments—face-to-face interview and self-completion questionnaire.

Second, our results make it possible to assess how biased conventional estimates of the effects of reported SAH are by misclassification. In our approach, the outcome model takes the form of a standard nonlinear model, such as a logit model, and can be specified not just in terms of SAH but also by including a vector of covariates. This makes it straightforward to compare our estimates of the outcome model to naïve estimates which ignore misclassification—that is, simple logit models of mortality and morbidity that include either the first or second reported SAH measure, as widely encountered in the health economics literature. Our estimator provides a way to adjust for misclassification in reported SAH when estimating the effects of SAH on mortality and morbidity in such models. As mentioned above, once the misclassification probabilities are identified, the effect of SAH on, say, mortality can be backed out because, for each reported SAH group, the group's distribution of SAH can be inferred and linked to the group's mortality. Similarly, our approach also makes it possible to assess how biases stemming from misclassification of reported SAH affect the estimates of other regressors of interest. Such spillover of the bias in reported SAH to other regressors can occur if the latter are correlated to SAH. Intuitively, one can understand the use of reported SAH as introducing a type of omitted variable problem: part of SAH is missing in the reported measure. If covariates are correlated to SAH (and thus also to the omitted part of SAH), this will bias the coefficients on these other variables. And due to the bias on the effect of SAH itself, even the non-omitted part is not being adjusted for appropriately, which will also further spill over to these correlated variables. Bago d’Uva et al. (2011) also look at such spillovers, albeit for a different outcome and with an approach based on vignettes.

Understanding and dealing with measurement errors in SAH has been and still is an active area of research within health econometrics. While Greene et al. (2018) and Brown et al. (2018) adjust for untruthful reporting in discrete dependent variable models, our focus lies in the case of discrete SAH taking the role of a regressor. In models seeking to explain labour supply decisions a key focus has been on individuals misclassifying (under-reporting) their SAH to justify not working (Bound, 1991, Currie and Madrian, 1999, Lindeboom and Kerkhofs, 2009, Black et al., 2017b).4 This can be problematic in these models because it can upwardly bias the estimated coefficient on SAH, but what is less commonly noted is that other reasons why SAH is misclassified will cause bias in the other direction (Bound, 1991). In the current paper, we consider ‘to justify not working’ as one of many reasons which may explain an individual's propensity (or probability) to misclassify their SAH. Our estimator can fully account for misclassification related to work status or any other factor, as long as the respondents are not misclassifying SAH to justify the outcome we use for identification (in our case mortality or the onset of chronic condition 15 years in the future). Our paper also contributes to the literature which investigates the association between “objective” and self-assessed health measures (Bound, 1991, Mossey and Shapiro, 1982, Butler et al., 1987, Baker et al., 2004, Doiron et al., 2015). We consider substantially longer-term associations between SAH and mortality (and morbidity) than in these studies (15 years vs 3–6 years), and we adjust the association by accounting for misclassification in reported SAH. The most closely related studies to ours are Crossley and Kennedy (2002), Clarke and Ryan (2006) and Black et al. (2017a), which also consider the change in an individual's response when SAH is asked twice; however, none of these papers estimates the impact of misclassification when reported SAH is used as a regressor, nor do they study the underlying misclassification probabilities.

Our main results are discussed in Sections 5 and 6. In Section 5, we document the empirical salience of the problem of differing answers to repeated SAH questions throughout the HILDA survey, which motivates our research, and we replicate the previous literature's reduced-form results by regressing these differences in SAH responses on covariates. As discussed, it is difficult to link such results to the underlying misclassification. Section 6 presents the results using our estimator on the HILDA data, which overcomes these problems. We find strong evidence for the presence of misclassification and for heterogeneity in misreporting behaviour across different population subgroups, such as males vs females and low vs high income earners. For instance, we find that men who are in excellent health almost never fail to report this in interviews, but not all men who report being in excellent health are truthfully reporting their SAH. We also document that there is less measurement error in the SAH question elicited by face-to-face interviews than in the one from the self-completion questionnaire. The results indicate that misclassification leads to statistically significant biases in the parameters of the mortality and morbidity models. While the bias is similar in absolute size across the models, this translates to relative biases in the coefficients of SAH ranging mostly from 10% to 20% in the mortality model, and as high as 100% for the morbidity model. For the coefficients of other covariates, the biases, while statistically significant, are more moderate and around 10%. Finally, we use our approach to estimate potential heterogeneity in the effect of SAH by specifying models with interactions of SAH with sex, age, education and income. With the exception of gender differences in mortality, the results indicate that the long-term effects of SAH on mortality and new chronic conditions are quite homogenous.

We conclude the paper in Section 7. Our findings suggest that when specifying models where SAH is the regressor of interest, it is important to adjust for misclassification. In case this is not possible, SAH measures from face-to-face interviews should be strongly preferred over self-completed SAH measures. On the other hand, our findings also indicate that when specifying models where SAH is used as a key control variable, there is likely to be little contamination of the variables of interest from the misclassification in SAH.

Section snippets

Identifying misclassification in SAH: an intuitive example

To fix ideas and give the intuition behind the identification, we discuss in this section a simple example where we have two binary misclassified SAH measures with potentially different misclassification probabilities and where SAH influences the probability of being dead at some point in the future.

Consider a simple hypothetical setting, where we assume that individuals’ self-assessed health, h*, is either good or bad, and each of these two health groups has a fixed probability of being alive

Econometric methods

In this section, we translate and generalise the previous example of misclassification in SAH to a formal regression framework that can be easily applied to commonly estimated health economic models and accommodates covariates, interaction effects with unobserved SAH, multinomial SAH with more than two categories, and heterogeneous misclassification probabilities (Section 3.1). We then present an expectation-maximisation (EM) algorithm to estimate this model, and discuss two ways of potentially

Monte Carlo experiments

Next, to benchmark the performance of our proposed finite mixture (FM) and penalised FM (PFM) estimators, we compare their performance to the ideal estimator that uses the unobserved SAH status which is infeasible in practice, and, on the other end of the spectrum, to the naïve estimator that just uses the first observed reported SAH measure, treating it as if it was the SAH status. As further points of comparison, we also examine four potential competitor estimators, which address

SAH misclassification in the HILDA data

In this section we first outline the HILDA data and describe the repeated reported measures of SAH in some HILDA waves. We then use these measures to replicate the descriptive reduced-form approach from the literature which involve regressing indicators of differences in the SAH measures on a vector of socioeconomic variables, and discuss what we can and cannot learn about misclassification from such estimates.

HILDA is an annual Australian household-based longitudinal survey that began in 2001 (

Estimating SAH misclassification and the effects of SAH on mortality and morbidity

In this section we present our estimates of our joint model of SAH misclassification and of the association between SAH and two outcomes measured 15 years after the initial survey: mortality (whether the individual is deceased) and, if the individual is still alive, whether they developed any chronic conditions in the 15-year period. We examine the estimated misclassification patterns in Section 6.1 and discuss the estimates from the outcome equations in Section 6.2, where we also compare our

Conclusions

While previous literature has documented that a large share of individuals report different SAH when asked twice, several important questions raised by this issue have so far remained unanswered. Because many forms of misclassification are compatible with observed differences in reported SAH, questions such as whether reported SAH is inherently unreliable or whether observed differences stem from one particular deficient measure could not be addressed. Similarly, it was not possible to know

Acknowledgements

We thank Denzil Fiebig, Bill Griffiths, Mark Harris, Joe Hirschberg, Maarten Lindeboom, Jenny Lye, Frank Windmeijer, Rainer Winkelmann, Eugenio Zuccheli, the participants of the European Workshop on Econometrics and Health Economics (Groningen), the Asian Meeting of the Econometric Society (Hong Kong), the China Meeting of the Econometric Society (Wuhan), the International Association for Applied Econometrics conference (Sapporo), the Australian Health Economics Society conference (Freemantle,

References (28)

  • Michael Baker et al.

    What do self-reported, objective, measures of health measure?

    J. Hum. Resour.

    (2004)
  • Basu, Anirban and Norma Coe, 2SLS vs 2SRI: Appropriate methods for rare outcomes and/or rare exposures, Unpublished...
  • Battistin, Erich, Michele De Nadai, and Barbara Sianesi, Misreported schooling, multiple measures and returns to...
  • John Bound

    Self-reported versus objective measures of health in retirement models

    J. Human Resour.

    (1991)
  • Cited by (3)

    • Is inconsistent reporting of self-assessed health persistent and systematic? Evidence from the UKHLS

      2023, Economics and Human Biology
      Citation Excerpt :

      the existing literature is not conclusive about the relationship between labour market status and consistent reporting behaviour on SAH. While some evidence shows that unemployed individuals and those out of labour force are more likely to provide inconsistent responses (see Clarke and Ryan, 2006; Crossley and Kennedy, 2002), more recent studies do not find systematic associations (see Black, Johnston, Suziedelyte, 2017; Black, Johnston, Shields, Suziedelyte, 2017; Chen et al., 2021). Completing the vector of socioeconomic characteristics, a four-category variable is used to account for marital status (married, single, separated/divorced, and widowed).

    • “The better you feel, the harder you fall”: Health perception biases and mental health among Chinese adults during the COVID-19 pandemic

      2022, China Economic Review
      Citation Excerpt :

      This occurs when subgroups of the population use systematically different cut-offs when reporting their SAH, although they have the same level of ‘true health’ (Hernández-Quevedo et al., 2005). Studies have attempted to address this issue of measurement error by using objective health indicators (Au & Johnston, 2014; Chen, Clarke, Petrie, & Staub, 2021; Lindeboom & van Doorslaer, 2004) and vignettes to adjust the scale (Bago d'Uva, O'Donnell, & van Doorslaer, 2008; King et al., 2004; Xu & Xie, 2017). As our data were collected during the COVID-19 pandemic, it was impossible to use objective measures of health or vignettes.

    View full text