Methods for the detection of carelessly invalid responses in survey data

https://doi.org/10.1016/j.jesp.2015.07.006Get rights and content

Abstract

Self-report data collections, particularly through online measures, are ubiquitous in both experimental and non-experimental psychology. Invalid data can be present in such data collections for a number of reasons. One reason is careless or insufficient effort (C/IE) responding. The past decade has seen a rise in research on techniques to detect and remove these data before normal analysis (Huang, Curran, Keeney, Poposki, & DeShon, 2012; Johnson, 2005; Meade & Craig, 2012). The rigorous use of these techniques is valuable tool for the removal of error that can impact survey results (Huang, Liu, & Bowling, 2015). This research has encompassed a number of sub-fields of psychology, and this paper aims to integrate different perspectives into a review and assessment of current techniques, an introduction of new techniques, and a generation of recommendations for practical use. Concerns about C/IE responding are a factor any time self-report data are collected, and all such researchers should be well-versed on methods to detect this pattern of response.

Introduction

Self-report psychological data are a ubiquitous source of information in many areas of psychological research. These self-reports are used in both experimental and non-experimental psychological research to measure a wide range of psychological constructs. Online data collection has made this process more accessible and widespread in recent years, but there are also many sources where error can enter this process. One type of this error comes from respondents who provide invalid data to survey questions. Fortunately, this is error that can be removed from the survey data collection process in order to produce more stable and consistent results (Huang et al., 2015, Maniaci and Rogge, 2014).

There are a number of reasons why research participants may provide responses that are in some way invalid; that is, data that do not represent actual ‘true’ values. Johnson (2005) identified three main classes of this invalid data: (1) linguistic incompetence or misunderstanding, (2) misrepresentation, and (3) careless or inattentive response. Linguistic incompetence deals with the construction of items and the process by which a survey is aimed (properly or improperly) at the population of interest. Misrepresentation deals with the issue of cheating, or faking (Griffith & Peterson, 2006), behaviors that are most likely on high-stake surveys or tests.

Carelessness or inattentive responding deals with a problem that potentially impacts anyone who does research with participants in low- to medium-stakes settings (e.g. subject pool participants, MTurk workers, less than fully motivated workplace samples). Unfortunately, some participants will simply not put in the effort required to respond accurately or thoughtfully to all questions asked of them. The inclusion of these responses into a set of otherwise accurate data can have a host of unexpected and undesired effects on relationships being examined (Huang et al., 2015).

This problem is related but distinct from other problems found in data, such as missingness. In effect, C/IE responders are missing data that is not actually missing. C/IE responders have provided a response when they might have well left that response blank. While some methods of dealing with missingness may be useful after these individuals are removed, this is beyond the scope of this paper, as these individuals must first be detected.

Fortunately, many methods have been established to detect and remove these invalid responders from datasets. Although there have been a number of reviews of some of these methods (e.g., Desimone et al., 2015, Johnson, 2005), this paper aims to summarize, connect, and extend these modern methodological tools, as well as to explain actionable ways to implement them. This will be accomplished by first summarizing this type of data and general concepts of detection before outlining specific techniques in detail. Following this, recommendations for use and common issues will be highlighted with the aim to help increase understanding of these methods. It will be recommended that a number of these methods be used in series with the goal of each to reduce the invalid data that can be best detected by that particular technique, and that this process be transparent and replicable in any given case.

Section snippets

What does careless or inattentive data look like?

One of the most telling ways to examine the perceived impact of these responses is to look at the language used to describe them. For many years, individuals who exhibited this method of response were called random responders (e.g., Beach, 1989). This was influenced by the notion that responses from these individuals were produced by some completely random process, such as flipping a coin or rolling a die. From a scale validity standpoint this meant that random error was introduced into data.

C/IE response and replicability

If there is one consensus that all C/IE researchers can agree on it is that these types of responders exist. This may seem trivial, but the simple acceptance that these behaviors are not an imagined phenomenon raises more than the simple question above regarding the typical rate of these individuals. It raises the question of how much this behavior varies across studies.

There has been little work on concretely establishing the range of these values. Hauser and Schwartz (in press) examined the

Detecting careless/insufficient effort (C/IE) responders

There are many potential causes of invalid data in general research. Invalid data on high-stakes tests and surveys, due to the valuable nature of these measures (e.g., clinical assessment and personnel selection), have been studied in depth for many decades (e.g., Berry et al., 1991, Birkeland et al., 2006, Butcher et al., 1989, Dunnette et al., 1962, Orpen, 1971). Researchers who study faking and cheating have the benefit that these behaviors are only manifested in very specific ways.

Speed of response: response time

Response time, the time it takes for an individual to respond to a set of items, is perhaps the most widely used tool for the elimination of C/IE responders. It is the most likely to be used on an intuitive basis even by those who have no knowledge of the C/IE literature. This intuitive use of response time can be independently derived by the practical extension of one simple assumption: there exists a minimum time needed to validly complete a survey.

Normal or average response time will

Invariability: long-string analysis

Perhaps the next most intuitive technique for use in the detection of C/IE responders is the analysis of strings of responses, known in the literature as ‘long-string analysis’ or ‘response pattern indices’ (Huang et al., 2012, Meade and Craig, 2012). This technique seems to have formally begun with Johnson's (2005) use of a borrowed technique to later be described in the work of Costa and McCrae (2008). This technique involves examining the longest string of identical responses from each

Outlier analysis: Mahalanobis distance

Outlier analysis is a fairly simple concept that is often taught in the very early stages of statistical training. Broadly, outliers can be simply considered unusual data points relative to the remainder of a distribution (Peck & Devore, 2012). Outliers can exist for many reasons, and C/IE responding is certainly among them. Individuals who are responding without sufficient effort are likely to differ from their thoughtful counterparts in some way, and it is not unreasonable to believe that

Individual consistency

Individual consistency, in the case of C/IE response detection, is a reference to the consistency of a response string within an individual. There are a number of techniques which all measure this consistency, and which will each be examined in each of the following sections. The underlying assumption of these individual consistency methods is, simply put, that an attentive respondent provides a pattern of responses that is internally consistent.

Consider early research, when responses from

Individual consistency: odd–even consistency and resampled individual reliability

One of the simplest individual consistency techniques is known as odd–even consistency (Meade & Craig, 2012) or individual reliability (Huang et al., 2012, Huang et al., 2014, Jackson, 1977, Johnson, 2005). The use of these two terms to describe the same technique is indicative of concepts potentially lost in the distinction between the two. This distinction is difficult to explain without describing the current technique which shares these names, so such description is a natural next step.

In

Individual consistency: semantic and psychometric antonyms/synonyms

The next family of techniques involves four techniques that are the result of crossing two dichotomous ideas. The first part of this is the simple distinction between antonyms and synonyms. Antonyms utilize pairs of opposite items, whereas synonyms utilize pairs of similar items. The second distinction is between semantic pairs and psychometric pairs. Semantic pairs are those that are paired from a purely linguistic approach (e.g., happy/sad), whereas psychometric pairs are those that are

Individual consistency: inter-item standard deviation

On the far opposite end from long-string analysis lies the relatively new method of inter-item standard deviations. This technique, proposed and tested by Marjanovic, Holden, Struthers, Cribbie, and Greenglass (2015), measures how much an individual strays from their own personal midpoint across a set of scale items. This measure is accomplished through the following formula, from Marjanovic et al. (2015):ISDj=j=1kXjX¯i2k1.

In this case, Xj represents the response to any given item, and X¯i

Individual consistency: Polytomous Guttman Errors

Guttman errors (Guttman, 1944, Guttman, 1950) are a concept originally designed for application on dichotomous test items. The premise of perfect Guttman scaling is that when items are ordered on difficulty, individuals should get easy items correct up to a point, then get all remaining, and more difficult, items wrong. Breaks in this expected sequence are called Guttman errors.

The calculation of this statistic on dichotomous (correct/incorrect) test items is based on the adjacent pairwise

Individual consistency: person total correlation

As with the extension of Guttman errors to Polytomous Guttman Errors, there is much that can be learned from long-established research on the detection of aberrant response patterns in testing data. One such technique from the testing literature is the “personal biserial” correlation (Donlon & Fischer, 1968). This concept is itself an extension of the idea of item–total correlations, and as such a brief discussion of item–total correlations is required.

An item–total correlation on some item in

Bogus/infrequency/attention check items/IMCs

The techniques discussed to this point have relied on post-hoc within-person statistics calculated from the sample of interest. These techniques can be applied to the data from already completed survey data collections, as long as the right types of items are present in the right quantities. Simply put, there isn't that much that can be calculated in terms of consistency on a one-item measure.

Conversely, a different family of techniques use the inclusion of specific items in scales to check

Self-report data

Another technique for identifying C/IE responders comes with a blend of transparency, simplicity, and forethought. Simply put, it is possible to ask respondents if they have responded in a way that they believe is valid (though perhaps not quite in those words). Meade and Craig (2012) created and tested a scale of participant engagement which produced two factors: one of diligence and one of interest. This diligence scale correlated reasonably well with some of the other techniques of C/IE

Additional factors: reverse worded items

There are many aspects of C/IE responding that have not been examined in robust detail in prior research. Similar to adding another set of conditions to an experiment, many of these aspects effectively double the complexity of any given examination or understanding. Chief among these is the concept of reverse worded items. Reverse worded items are simply those which are directionally disparate from the ‘normal’ items in a scale. That is, individuals with higher levels of the underlying latent

Recommendation: multiple hurdles approach

A number of methods have been illustrated here, and a summary of these techniques can be found in Table 2. The reader may recall or recognize that not all of these techniques measure the same construct, and may correlate positively, negatively, or not at all with each other (Meade & Craig, 2012). As an extension of this, individuals who are identified by one technique are not necessarily identified by the others. In fact, identification by one technique sometimes means a reduced likelihood of

Conclusions

Overall, there are a number of ways to screen survey data for C/IE responders. This paper has presented a number of new additional techniques, and this entire group of techniques should be regularly used by survey researchers. The removal of these invalid responders has been shown to reduce error and provide more valid results. Best guesses put the average inclusion of C/IE responders at around 10% (Curran et al., 2010, DeRight and Jorgensen, 2015, Maniaci and Rogge, 2014, Meade and Craig, 2012

References (46)

  • P.T. Costa et al.

    The revised NEO personality inventory (NEO-PI-R)

  • P.G. Curran et al.

    Understanding responses to check items: A verbal protocol analysis

  • P. Curran et al.

    The impacts of invalid responding: a simulation study

  • P. Curran et al.

    Careless responding in surveys: Applying traditional techniques to organizational settings

  • T. Davey et al.

    Controlling item exposure and maintaining item security

  • J. DeRight et al.

    I just want my research credit: Frequency of suboptimal effort in a non-clinical healthy undergraduate sample

    The Clinical Neuropsychologist

    (2015)
  • J.A. Desimone et al.

    Best practice recommendations for data screening

    Journal of Organizational Behavior

    (2015)
  • T.F. Donlon et al.

    An index of an individual's agreement with group-determined item difficulties

    Educational and Psychological Measurement

    (1968)
  • M.B. Donnellan et al.

    The mini-IPIP scales: Tiny-yet-effective measures of the Big Five factors of personality

    Psychological Assessment

    (2006)
  • M.D. Dunnette et al.

    A study of faking behavior on a forced choice self-description checklist

    Personnel Psychology

    (1962)
  • C. Ehlers et al.

    The exploration of statistical methods in detecting random responding

  • W.H.M. Emons

    Nonparametric person-fit analysis of polytomous item scores

    Applied Psychological Measurement

    (2008)
  • L.R. Goldberg et al.

    The prediction of semantic consistency in self descriptions: Characteristics of persons and of terms that affect the consistency of responses to synonym and antonym pairs

    Journal of Personality and Social Psychology

    (1985)
  • Cited by (635)

    • What is wrong with individual differences research?

      2024, Personality and Individual Differences
    View all citing articles on Scopus

    The author would like to thank Katherine S. Corker, M. Brent Donnellan, Fred L. Oswald, and John F. Binning for valuable comments and suggestions on earlier drafts of this paper. In addition, the author would like to thank the editors of this special issue, Edward Lemay and Chuck Stangor, as well as two anonymous reviewers, for their valuable feedback and suggestions.

    View full text