Elsevier

Safety Science

Volume 76, July 2015, Pages 67-73
Safety Science

How reliable are self-report measures of mileage, violations and crashes?

https://doi.org/10.1016/j.ssci.2015.02.020Get rights and content

Highlights

  • Reliability of self-reported crashes, violations and mileage were found to be low.

  • These associations differed from what could be expected.

  • Given the results, common method variance is probably the explanation for many associations for the variables studied.

Abstract

The use of self-reported driver mileage, violations and crashes is very popular in traffic safety research, but their validity has been questioned. One way of testing validity is with an analysis of test–retest reliability. Three mechanisms might influence reliability in self report; actual changes in the variable over time, stable systematic reporting bias, and random error. Four samples of drivers who had responded twice to an online questionnaire asking them to report their mileage, violations and crashes were used and correlations between self reports for this data were calculated. The results for crashes were compared to expected correlations, calculated from the error introduced by the non-overlapping periods and the variable means. Reliability was fairly low, and controlling for mileage in the violations and crashes calculations did not strengthen the associations. The correlation between self reports of crashes in different time periods was found to be much larger than expected in one case, indicating a report bias, while the other correlation agreed with the predicted value. The correlations for overlapping time periods were much smaller than expected. These results indicate that drivers’ self reports about their mileage, violations and crashes are very unreliable, but also that several different mechanisms are operating. It is uncertain exactly under what circumstances different types of self report bias is operating. Traffic safety researchers should treat the use of self-reported mileage, violations and crashes with extreme caution and preferably investigate these variables with the use of objective data.

Introduction

For many decades, it has been common practice in research on individual differences in traffic safety to use self-reported data for crashes, violations and mileage, assuming that drivers can and will report this information in a reliable way. Given the difficulties of acquiring official data, and the problematic properties of these sources of information (for a review see Hauer and Hakkert, 1988), many researchers have concluded that official data are inferior to self-reports (e.g. McGuire, 1973, Panek et al., 1978, Malta et al., 2005), or ignored the question and used self-reports regardless of this issue. Consequently, research on individual differences in traffic safety has mainly been built on a foundation of self-reported data.

Many traffic safety researchers argue that if an effect is found in self-report data, despite the doubtful validity of self reporting, the real effect is probably even stronger (see the review in af Wåhlberg, 2009). It is assumed that any reporting biases will only affect the independent variables, and therefore cause random error at worst, which will only serve to detract from the real effects. However, the same biases can also influence the outcome variables (mainly crashes and violations). Recently, such systematic biases in self-reports of crashes have been reported (af Wåhlberg et al., 2010; see the review in af Wåhlberg, 2009) as well as common method variance (artefactual associations in the data) (af Wåhlberg, 2009, af Wåhlberg, 2010a, af Wåhlberg et al., 2011, af Wåhlberg et al., submitted for publication).

As with common method variance, the most obvious reason why traffic safety researchers have neglected to consider or disregarded the possibility that self-reported crashes and violations could be systematically biased may be because their properties have not been tested. Although many studies have reported extremely low agreement between state records and self-reports for these variables, this has usually been blamed upon the low validity of the records (see the review in af Wåhlberg, 2009). It would therefore be prudent to examine the psychometric property of reliability for self reported crashes and violations, and their closely related counterpart mileage, as assessing reliability can tell us something about the size and nature of the bias involved. However, as will be discussed, the assessment of reliability for these variables is probably more complicated than previously anticipated.

Turning back to traffic safety research, there are two variables that are commonly used as outcome parameters; crashes and violations (also called offences, citations, endorsements, penalties etc.). These are often used in conjunction with mileage, with the latter as a control for exposure. Given that the time periods for canvassing self-reports of mileage, violations and crashes are most often over a period of years (i.e. asking how many crashes the participant has had in the last 3 years is common in the traffic safety literature; af Wåhlberg, 2003), it seems fair to suggest that several hundred researchers have assumed that drivers report these data consistently over time periods of at least a few months.

However, tests of consistency of reporting are difficult to find for mileage, violations and crashes (af Wåhlberg, 2009). Arthur (1991) and Arthur and Graziano (1996) reported very high correlation coefficients between repeated self-reports of crashes, but as there were only two or three days between measurements (i.e. the report periods were strongly overlapping), anything but a perfect association is somewhat suspect.

Turning to correlations of self-reported crashes in different time periods, an anomaly exists in the literature. In French et al. (1993), a correlation of .305 between crashes in a period of three years versus one year was reported. As will be described below, this value would seem to be strongly inflated. Furthermore, in West et al. (1992), the same data were apparently used, but the correlations reported were much lower (see the discussion in af Wåhlberg, 2009). No other studies of the reliability of self-reported accidents have been found.

Concerning self-reported violations, no tests of reliability over time have been located. However, some results have been published which may indicate reporting bias. Apart from generally low correlations between state sources and self-reported violations, it has also been found that drivers tend to report many more violations than those found in official records. Whether this is due to a reporting bias, record cleaning practices and/or self-reporting of violations in other States (many of the studies were from the US) is not known (af Wåhlberg, 2009).

For mileage, the most relevant study was authored by Alonso et al. (2006), who found that self-reports correlated .64 over a period of ten months (other tests indirectly indicate low reliability due to computational difficulties, where people give different estimates due to the time period used, see the review in af Wåhlberg, 2009). Here, the method of measurement differed between occasions, with questionnaire and interview data being collected. More importantly, however, was that the instruction was to report mileage for a specific period of time (i.e. total overlap of report periods), instead of the last time period before the question was asked. For short time periods, as those studied by Arthur et al., this difference is probably of no importance, but as the period lengthens, the reports will differ due to changes occurring naturally over time.

Considering mileage alongside violations and crashes, it can be noted that exposure (conceptualized in the variable mileage) is most often seen as having a positive and causal association with these variables, but otherwise unrelated to what drivers carry with them into the driving situation (i.e. individual differences). This is never explicitly stated, but can be concluded from how the mileage variable is treated within traffic safety research, i.e. it is controlled for as a confounder (see further the discussion in af Wåhlberg, 2009). However, contrary to this concept and use of the mileage variable in practice, controlling for it does not yield larger effect sizes (af Wåhlberg, 2009), i.e. it is usually not a confounder in the sense of hiding an effect. Instead, it may carry some of the effect when crashes and violations are predicted from other variables (af Wåhlberg et al., submitted for publication). No study has been located which has controlled for the influence of mileage for between time period correlations for crashes and violations.

Within psychometrics, reliability (i.e. similarity in results between measurements) of the variables measured is considered to be of importance (Anastasi, 1988), due to the basic logic that an unstable variable cannot be predicted (there are also other reasons that are not of any concern for the present paper). Therefore, if self-reported crashes and other traffic safety indicators are found to have low reliability, this reflects on the evaluation of results where significant associations have been found for these variables in relation to predictors. If reliability is low, results may be an artefact, because an unreliable measure cannot predict anything, or be predicted, unless there is some sort of bias that creates an association.

The social sciences have attempted to explain why people under- or over-report which may have a bearing on the accuracy of self reported traffic safety indicators. The research points to either recall issues due to the self report time period between the event and subsequent recall (Anderson and Schooler, 1991) and/or distorted memories. It could be that response styles which are relatively stable over time (He and van de Vijver, in press) may reflect self-presentational needs when communicating information about oneself to others (Paulus, 1991). Unreliability in self reported traffic safety indicators may also be a function of distorted memories. Crashes and penalty points are negative events that may be more prone to memory distortion (Porter et al., 2010) but disentangling faking behavior from genuine response errors is difficult (Shoss and Strube, 2011). If specific information is not readily accessible in memory, respondents may draw from their belief about their standing in the general domain (Willard and Gramzow, 2008), and drivers tend to be optimistically biased about their driving skills and abilities (McKenna, 1993).

For young drivers in particular, it appears that their self esteem is inextricably linked with self perceptions of themselves as drivers (Falk and Montogmery, 2007) including a particular bias in their beliefs about their driving skills and chances of being involved in a crash relative to older drivers (e.g. McKenna et al., 1991). These kind of cognitive biases may distort the memories amongst young people in a systematic way and impact on the reliability of self reported crashes and driving offences.

The standard method for reliability measurements is to calculate the correlation between the results from different data collection occasions (Anastasi, 1988). However, usually no effort is made to investigate the possible determinants of the changes between measurements. This is probably due to an assumption that the phenomena to be studied are fairly stable over time, and that the error involved is random.

However, psychometric constructs are usually continuous and omnipresent, while crashes and violations are discrete events in time. Therefore, when calculating and interpreting reliability for these variables, there are different factors that must be taken into account. This difference does not seem to have been noted by those few researchers who have reported test–retest correlations for these variables.

When discrete event type data is used for reliability calculations, and is objective and valid, then reliability indicates the stability of the phenomenon in itself over time. For example, actual, recorded, traffic collision involvement has a certain between-time periods correlation, which increases strongly with the extension of the periods (af Wåhlberg and Dorn, 2009; see also the meta-analysis in af Wåhlberg, 2009). A similar effect can be expected for violations, although this cannot be meta-analytically tested, due to lack of published results.

However, if crash involvement is self reported, there may be two different types of report bias with opposing effects. First, a stable systematic difference in reporting tendency may exist. For example, drivers with many crashes on record tend to under-report their involvement (see the review in af Wåhlberg, 2009), but there are also drivers who over-report (af Wåhlberg, 2002) although this phenomenon is in need of replication. Furthermore, report bias has been identified by the use of a lie scale and comparisons of effects versus self-reported and recorded crash data (af Wåhlberg et al., 2010, af Wåhlberg, 2011a). Therefore, it can be suspected that self-reported data may inflate reliability, due to individual differences in reporting bias.

Second, random error may operate, due to memory difficulties. It has been shown that about 25–30% of collisions are forgotten per year (Maycock et al., 1991, Maycock and Lester, 1995, af Wåhlberg, 2012). If the memory effect is unsystematic i.e. not due to poor recall for some drivers, but the result of environmental circumstances, this should have a detrimental effect on the between-measurements reliability of self-reported crashes. The memory effect as studied so far only indicates forgetting over time. However, another mechanism is also possible. Memories for events might fluctuate somewhat over time. This would imply that at one time an incident is remembered, while at another time it is not. Therefore, drivers may report crashes at Time 2 that they failed to recall at Time 1 (as found by Alonso et al., 2006). Thus, random reporting errors are added to the actual unreliability of a driver’s accident record for different periods. Given the above, for the present study, it could not be predicted whether the correlations found between self-reports would be larger or smaller than expected, because there are two competing forces which could influence the results.

When calculating reliability for self-reported crash and violation data over time, we can distinguish between two different reliabilities; first, test–retest reliability of reports for the same time period, and second, reliability between different time periods. Using the first method, the natural changes between measurements are excluded, while in the second the actual stability of the variable is the starting point, upon which the two effects discussed above will have their influence.

Turning to the matter of how to determine what kind of mechanism is operating in self-reported crash and violation data, it can be noted that no criterion has ever been stated for the different kinds of reliability coefficient that can be expected in a calculation. For crashes, this lack of knowledge can be addressed in the present paper.

A new method for testing whether the inter-correlation between crashes in different time periods is different from what can be expected is to calculate the expected correlation from the mean of crashes in the sample and compare it with the actual value found. This method is based upon the finding that a correlation between crashes in different time periods is strongly determined by the mean of crashes (i.e. the degree of restriction of variance), and that the increase is linear within the time periods such as those commonly used within traffic safety research (af Wåhlberg, 2003, af Wåhlberg, 2009). Therefore, an expected correlation can be calculated using the regression equation for the correlation between time period (squared) correlation and the mean of crashes reported (data from af Wåhlberg, 2009).

When, on the other hand, reports are for the same time period, the correlation should be perfect (r = 1.0). However, when self reported crashes/violations are for a certain time period before the measurement occasion, the reporting periods will differ according to the time difference between the questionnaire waves, and a natural difference in reports should be entered. This expected difference can also be calculated, because it can be estimated how highly self reports for overlapping time periods should correlate if there is no reporting error, but only natural differences, due to the time difference between the information being requested. This can be calculated from the number of events in the samples and the time differences between reports as most participants have zero or one crash. The assumption made is that crashes are evenly distributed over the reporting periods. Therefore, the number of crashes happening in the periods that are not overlapping can be calculated, and compared to the differences in reporting. It should be noted that this method yields an over-estimation of the natural change in number of crashes, because it does not take into account the possibility that a driver with a crash in the first non-overlapping period might have a crash in the second non-overlapping period.

Thereafter, the (maximum) difference in crash number between measurements, due to the non-overlapping time periods, can be entered into a new variable. In other words, the first measurement is duplicated, and the crashes that can be expected to have taken place in the first non-overlapping time period are removed, and the crashes in the second non-overlapping period added, both randomly. Between the first, actual measurement, and the ‘simulated’ one, an expected correlation can be computed, to which the actual result between measurements can be compared.

As noted, there are two competing self report effects which may be operating and possibly influencing the results of the reliability calculations for crashes and violations. Given the two different calculation methods for reliability described (different and overlapping time periods), what results can be expected? With little previous research and no theory from which to make specific predictions, the only consideration is the nature of the difference between the two tasks being undertaken.

In the overlapping time period situation, the participant is required to report the same information twice. This means that report bias will, in essence, be a constant, while the memory effect will have its full effect, although the time periods between these in the present study were rather short. In the different time period task, the report bias will have its full effect.

In summary, the present study set out to investigate how self reported mileage, violations and crashes are associated with these same variables reported at another time point. The crash correlations were compared with expected values, which were derived from what was known about how this variable should behave in the presence of no reporting error.

Section snippets

Samples and variables

The data used in the present study were gathered in three different evaluation studies concerning the effects of e-learning based driver education on the behavior of traffic offenders in the UK (af Wåhlberg, 2010b, af Wåhlberg, 2011b, af Wåhlberg, 2013). Participants from an educational scheme for young offending drivers (Young Driver Scheme, YDS) who responded to the first and third waves of a questionnaire were included in the present study, as well as a random control sample from an e-mail

Results

Descriptive results for the samples can be seen in Table 1, Table 2. No extraordinary values were apparent in these calculations, possibly apart from the over representation of men in the seatbelt scheme sample.

Thereafter, mileage was correlated with crashes and violations (Table 3). As is commonly found, these associations were rather weak.

As can be seen in Table 4, the expected differences in number of crashes were very different from the actual numbers. Taking the SS sample as an example,

Discussion

The present results would seem to imply that self-reports of crashes, violations and mileage are very unreliable, given the low correlations between measurements, even though the time periods between measurements were fairly short, as compared with the length of time such predictors are usually reported for (af Wåhlberg, 2003). This effect was especially evident when the drivers should have been mainly reporting the same events on both occasions. The question is why this is so.

The finding that

Acknowledgements

The data used in this paper were gathered in projects involving the police forces of Thames Valley and Greater Manchester, and the companies DriveTech and a2om. Chris Johnson (a2om) set up the questionnaires on line. Two anonymous reviewers provided useful feedback.

References (41)

  • B.E. Porter et al.

    A nationwide survey of self-reported red light running: measuring prevalence, predictors, and perceived consequences

    Acc. Anal. Prev.

    (2001)
  • B.E. Porter et al.

    Predicting red-light running behavior: A traffic safety study in three urban settings

    J. Safe. Res.

    (2000)
  • M.K. Shoss et al.

    How do you fake a personality test? An investigation of cognitive models of impression-managed responding

    Organ. Behav. Human Decis. Process.

    (2011)
  • F.M. Streff et al.

    Are there really shortcuts? Estimating seat belt use with self-report measures

    Acc. Anal. Prev.

    (1989)
  • G. Willard et al.

    Exaggeration in memory: Systematic distortion of self-evaluative information under reduced accessibility

    J. Exp. Soc. Psychol.

    (2008)
  • af Wåhlberg, A.E., 2002. On the validity of self-reported traffic accident data. In: Manama: E140 Proceedings of the...
  • A.E. af Wåhlberg

    Driver Behaviour and Accident Research Methodology; Unresolved Problems

    (2009)
  • af Wåhlberg, A.E., 2012. Memory effects in self-reports of crashes. In: Dorn, L. (Ed.), Driver Behaviour and Training,...
  • A.E. af Wåhlberg

    Evaluation of an e-learning seatbelt wearing intervention

  • A.E. af Wåhlberg et al.

    Bus driver accident record; the return of accident proneness

    Theor. Iss. Ergon. Sci.

    (2009)
  • Cited by (35)

    • Young drivers’ perception of hazards: Variation with experience and day versus night

      2022, Transportation Research Part F: Traffic Psychology and Behaviour
    • Driver Stress and Driving Performance

      2021, International Encyclopedia of Transportation: Volume 1-7
    View all citing articles on Scopus
    View full text