Elsevier

Safety Science

Volume 49, Issue 6, July 2011, Pages 753-763
Safety Science

Review
Human reliability analysis: A critique and review for managers

https://doi.org/10.1016/j.ssci.2011.02.008Get rights and content

Abstract

In running our increasingly complex business systems, formal risk analyses and risk management techniques are becoming more important part to managers: all managers, not just those charged with risk management. It is also becoming apparent that human behaviour is often a root or significant contributing cause of system failure. This latter observation is not novel; for more than 30 years it has been recognised that the role of human operations in safety critical systems is so important that they should be explicitly modelled as part of the risk assessment of plant operations. This has led to the development of a range of methods under the general heading of human reliability analysis (HRA) to account for the effects of human error in risk and reliability analysis. The modelling approaches used in HRA, however, tend to be focussed on easily describable sequential, generally low-level tasks, which are not the main source of systemic errors. Moreover, they focus on errors rather than the effects of all forms of human behaviour. In this paper we review and discuss HRA methodologies, arguing that there is a need for considerable further research and development before they meet the needs of modern risk and reliability analyses and are able to provide managers with the guidance they need to manage complex systems safely. We provide some suggestions for how work in this area should develop. But above all we seek to make the management community fully aware of assumptions implicit in human reliability analysis and its limitations.

Research highlights

► Human behaviour is often a root or significant contributing cause of system failure. ► Human reliability analyses focus on errors rather than the effects of all forms of human behaviour. ► Organisation design and culture can correlate failure risks. ► A research programme to address these issues is suggested. ► Managers need to question the numbers that risk and reliability analyses provide.

Introduction

Complex systems are never 100% reliable: they fail, sometimes catastrophically, more usually reparably. Perrow, 1984, Perrow, 1994 has argued that failures are an inevitable consequence of the increasing complexity of our systems. Whatever the case, inevitable or not, failures undoubtedly occur. Even in systems that appear to be largely technological rather than human, we find that in the majority of cases there is a human element involved. Maybe some erroneous or even malicious behaviour initiates the failure; maybe the human response to some event is insufficient to avoid system failure; or maybe the original design of the system did not anticipate a potential failure or unfavourable operating conditions.

Statistics show human error is implicated in (see also Hollnagel, 1993):

  • over 90% of failures in the nuclear industry (Reason, 1990a), see also (United States Nuclear Regulatory Commission, 2002);

  • over 80% of failures in the chemical and petro-chemical industries (Kariuki and Lowe, 2007);

  • over 75% of marine casualties (Ren et al., 2008);

  • over 70% of aviation accidents (Helmreich, 2000);

  • over 75% of failures in drinking water distribution and hygiene (Wu et al., 2009).

In addition to highly technological industries, there are other complex systems involving applications of technology in which we include complex mathematical modelling, software and web-based systems. The growth of service industries with new business models implies an even greater dependence of businesses, organisations and even economies on reliable human interactions. For instance, recently human checks and balances failed to detect dubious investment behaviour of a trader at Société Générale and led to a loss of some €4.9bn, large enough to have economic and financial effects beyond the bank. The current ‘credit crunch’ owes not a little to misjudgement and error in the banking and finance sectors, indicating the growing interdependence of many disparate parts of the modern global economy. It also owes a lot to a loss of investors’ confidence and trust, both of which inform human behaviour. These data indicate how vulnerable our systems are, even after many years of refinement and improvement; and how important an understanding of human behaviour is if we are to reduce the risk to systems. Another high profile example is the leak in the THORP plant at Sellafield (Thermal Oxide Reprocessing Plant) that was discovered in 2005 (see Board of Inquiry, 2005). This relatively modern plant had been designed to a high standard of safety, but information indicating a system problem was available for some months and yet went unnoticed. Despite previous incidents in 1998 and earlier in 2005, the information that should have suggested a leak, or at least a problem requiring investigation, was misinterpreted. The prevailing attitude was that the system was error-free and hence information that could suggest the contrary was ignored or dismissed.

Managerial processes are critical to successful operation of any complex system; and the quality of management processes depends on their understanding of the import and limitations of the results of the risk (and other) analyses that are provided to them. We emphasise here that all managers, whether or not they have an explicit responsibility for risk management, need to have some understanding of the assumptions and limitations of such analyses. In this article, we examine current and past approaches to human reliability analysis (HRA). We discuss its assumptions, limitations and potential in qualitative terms so that managers can better assess the value of the information that it provides them and so manage risks more effectively. We also suggest that further development of HRA methodologies should take more account of the managerial practices that could be applied to reduce the failures that occur at the interface of human behaviour and technology.

Managers understand human behaviour; good managers understand human behaviour extremely well. To bring out the best in a team one needs to know how each will respond to a request, an instruction, an incentive or a sanction. Yet only the most foolhardy and overconfident of managers would claim that they can predict human behaviour perfectly all the time – or even 95% of the time. The problem is that we often need to design systems with very high reliabilities, many times with overall failure rates of less than 1 in 10 million (i.e. 1 in 10−7). To design and analyse such systems we need a deep understanding of human behaviour in all possible circumstances that may arise in their management and operation. And that is the challenge facing HRA. Our current understanding of human behaviour is not sufficiently comprehensive: worse, current HRA methodologies seldom use all the understanding that we do have.

Of course, there is a trivial mathematical answer to this. If we are to achieve an overall system reliability of 10−7, we do not need humans to be perfectly reliable. We simply need to know how reliable they are and then ensure that we arrange and maintain sufficient safety barriers around the system to ensure that overall system failure probabilities are as low as required. Suppose we construct seven independent safety barriers perhaps some involving humans, some purely technological and suppose each has a probability of 1 in 10 of failing, then arranging them (conceptually) in sequence so that the whole system fails if and only if every one of the seven fails gives an overall probability of system failure of110×110×110×110×110×110×110=10-7.

The problem with this is that there are few barriers that are truly independent, most systems offer opportunities to ‘bypass’ these barriers. Moreover, human behaviour tends to introduce significant correlations and dependencies which invalidate such calculations, reducing the benefit that each extra safety barrier brings; such problems with protective redundancy are well known (for example, Sagan, 2004). So the simplistic calculation does not apply, and we shall argue that we have yet to develop sufficiently complex mathematical modelling techniques to describe human behaviour adequately for risk and reliability analyses.

In many ways the roles of risk and reliability analysis in general and of HRA in particular are often misunderstood by system designers, managers and regulators. In a sense they believe in the models and the resulting numbers too much and fail to recognise the potential for unmodelled and possibly unanticipated behaviours – physical or human – to lead to overall system breakdown (cf. French and Niculae, 2005). Broadly there are two ways in which such analyses may be used.

  • When HRA is incorporated into a summative analysis, its role is to help estimate the overall failure probabilities in order to support decisions on, e.g., adoption, licensing or maintenance. Such uses require quantitative modelling of human reliability; and overconfidence in these models can lead to overconfidence in the estimated probabilities and poor appreciation of the overall system risks.

  • There are also formative uses of HRA in which recognising and roughly ranking the potential for human error can help improve the design of the system itself and also the organisational structures and processes by which it is operated. Effective HRA not only complements sound technical risk analysis of the physical systems, but also helps organisations develop their safety culture and manage their overall risk. Indeed, arguably it is through this that HRA achieves its greatest effect.

These uses are not independent – in designing, licensing and managing a system one inevitably iterates between the two – they do differ, however, fundamentally in philosophy. In summative analysis the world outside the system in question learns from the outcome of an analysis; in formative analysis the world inside the system learns from the process of analysis. In summative analysis the ideal is almost to be able to throw away the process and deal only with the outcome; in formative analysis the ideal is almost to throw away the outcome and draw only from the process. While we believe that HRA has a significant potential to be used more in formative ways; we are concerned at its current ability to fulfil a summative role, providing valid probabilities of sequences of failure events in which human behaviour plays a significant role. We believe that there is scope for considerable overconfidence in the summative power of HRA currently and that management, regulators and society in general need to appreciate this, lest they make poorly founded decisions on regulating, licensing and managing systems.

The four of us were part of a recent UK EPSRC funded multi-disciplinary project Rethinking Human Reliability Analysis Methodologies to survey and critique HRA methodologies (Adhikari et al., 2008). Our purpose in this paper is to draw out the relevant conclusions from this project for the management community and, perhaps as well, for our political masters who create the regulatory context in which complex systems have to operate. Overall we believe that current practices in and uses of HRA are insufficient for the complexities of modern society. We argue that the summative outputs of risk and reliability analyses should be taken with the proverbial pinch of salt. But not all our conclusions will be negative. There is much to be gained from the formative use of HRA to shape management practices and culture within organisations and society which can lead to better, safer and less risky operations.

In the next section we briefly survey the historical development underlying concepts of HRA and its role in risk and reliability analyses. We reflect on the widely quoted Swiss Cheese Model (Reason, 1990b), which seeks to offer a qualitative understanding of system failure – though we shall argue that it may actually lead to systematic misunderstandings! In Section 0 we turn to modern theories of human behaviour, particularly those related to judgement and decision. A key issue is that HRA focuses on human errors, whereas many systems failures may arise not just despite, but sometimes because of fully appropriate and rational behaviour on the part of those involved. Thus we need a broader understanding of human behaviour than that relating to human error. We also need to recognise that cultural, organisational, social and other contexts influence behaviour, perhaps correlating behaviour across a system, thus invalidating assumptions of independence commonly made in risk and reliability analyses. One of the flaws common to many current HRA methodologies is that they tend to focus on easily describable, sequential, generally low-level operational tasks. Yet the human behaviour that is implicated in many system failures may occur in other quite different contexts, maybe in developing higher level strategy or during the response to an unanticipated initiating failure event. In recent years there have been many studies of organisational forms which seem to be more resilient to system failures than might be expected and we discuss such studies of high reliability organisations (HROs) briefly in Section 0. Another flaw common to many current HRA methodologies is the lack of specification of the domain of applicability – hence making it difficult to select appropriate methods for a given problem. Therefore in Section 0, we use Snowden’s Cynefin classification of decision contexts (Snowden, 2002, Snowden and Boone, 2007) to categorise different circumstances in which human behaviour may be involved in system failure. We believe that the use of Cynefin – or a similar categorisation of decision contexts – can help in delineating when different HRA methodologies are appropriate. Moreover, it points to areas in which we lack a really sound, appropriate HRA methodology. Our final two sections draw our discussion to a close, suggesting that:

  • by drawing together current understandings from HRA with other domains of knowledge in behavioural, management and organisational theories, we can make better formative use of HRA in designing systems, process and the organisations that run these;

but that:

  • the state of the art in quantitative HRA is too poor to make the summative assessments of risk and reliability that our regulators assume, and that society urgently needs to recognise this.

Section snippets

HRA methodologies and the Swiss cheese model

Reliability analysis and risk analysis are two subjects with a great deal of overlap (Aven, 2003, Barlow and Proschan, 1975, Bedford and Cooke, 2001, Høyland and Rausand, 1994, Melnick and Everitt, 2008). The former is generally narrower in scope and tends to deal with engineered systems subject to repeated failures and the need for preventative maintenance policies to address these. Key concepts in reliability engineering include component availability, reliability and maintainability; mean

Human behaviour and human error

Human behaviour is complex and often non rational. For instance, it seems sensible to use modern technological advances to make the physical components of a system safer. But there is some evidence that making subsystems safer could make the overall system less safe because of the propensity of humans to take less care personally when a system takes more care (Adams, 1988, Hollnagel, 1993). In this section we survey some recent findings from behavioural decision studies and consider how this

High reliability organisations

The past 20 years has seen several studies of high reliability organisations (HROs), which Roberts (1990) defined as organisations failing with catastrophic consequences less than one time in 10,000. These studies recognise that certain kinds of social organisation are capable of making even inherently vulnerable technologies reliable enough for a highly demanding society.

An HRO encourages a culture and operating style which emphasises the need for reliability rather than efficiency (Weick, 1987

Decision contexts

There is a further aspect of context that HRA should consider: decision context. The judgements and decisions needed of humans in a system can vary from those needed to perform mundane repetitive operational tasks through more complex circumstances in which information needs to be sought and evaluated to identify appropriate actions to the ability to react to and deal with unknown and unanticipated. Decision processes will vary accordingly. Design decisions can inadvertently introduce further

Toward an extended model of HRA

Summative HRA and related approaches emphasise quantification and prediction. While cognitive understanding of people and cultural perspectives on organisations are acknowledged, the gulf between these and quantitative risk models is generally considered too significant to be bridged. Yet the conjoining of these approaches could yield a superior model of safety critical organisations and the people working within them. In the short term, exploring the interfaces between HRA and behavioural,

Conclusion: a message for managers

The key point that we have been trying to convey in this paper is the current dislocation between the mechanistic reductionist assumptions on which current HRA methodologies are primarily built and our current understandings of human and organisational behaviour. We must bring these into better register. Managers, regulators, politicians and the public need to beware of this lest they believe the numbers that are sometimes touted about the safety of our systems. This should not be read as a

Acknowledgements

This work was supported by the Engineering and Physical Sciences Research Council (Contract number: EP/E017800/1). We are grateful to our co-investigators and colleagues on this: Sondipon Adhikari, Clare Bayley, Jerry Busby, Andrew Cliffe, Geeta Devgun, Moetaz Eid, Ritesh Keshvala, David Tracy and Shaomin Wu. We are also grateful for many helpful discussions with Ronald Boring, Roger Cooke and John Maule.

References (101)

  • Adhikari, S., Bayley, C., Bedford, T., Busby, J.S., Cliffe, A., Devgun, G., Eid, M., French, S., Keshvala, R., Pollard,...
  • T. Aven

    Foundation of Risk Analysis: A Knowledge and Decision Oriented Perspective

    (2003)
  • J.A. Bargh et al.

    The unbearable automaticity of being

    American Psychologist

    (1999)
  • J.A. Bargh et al.

    Automaticity of social behavior: direct effects of trait construct and stereotype activation on action

    Journal of Personality and Social Psychology

    (1996)
  • Barlow, R.E., Proschan, F., 1975. Statistical Theory of Reliability and Life Testing. Holt, Reinhart and Winston, New...
  • Barriere, M., Bley, D., Cooper, S., Forester, J., Kolaczkowski, A., Luckas, W., Parry, G., Ramey-Smith, A., Thompson,...
  • M.H. Bazerman

    Reviews on decision making

    Administrative Science Quarterly

    (1999)
  • M.H. Bazerman

    Managerial Decision Making

    (2006)
  • T. Bedford et al.

    Probabilistic Risk Analysis: Foundations and Methods

    (2001)
  • Board of Inquiry, 2005. Fractured Pipe with Loss of Primary Containment in the THORP Feed Clarification Cell. British...
  • Boring, R.L., 2007. Dynamic Human Reliability Analysis: Benefits and Challenges of Simulating Human Performance...
  • C.S. Carver et al.

    Attention and Self-Regulation: A Control Theory Approach to Human Behavior

    (1981)
  • S. Chaiken et al.

    Heuristic and systematic information processing within and beyond the persuasion context

  • L. Clarke

    Drs Pangloss and Strangelove meet organizational theory: high reliability organizations and nuclear weapons accidents

    Sociological Forum

    (1993)
  • Commission on the Three Mile Island Accident, 1979. Report of the President’s Commission on the Accident at Three Miles...
  • P.-J. Courtois et al.

    Bayesian belief networks for safety assessment of computer-based systems

  • E. Fadier

    Editorial of the special issue: design process and human factors integration. cognition

    Technology and Work

    (2008)
  • E. Fadier et al.

    How to integrate safety in design: methods and models

    Human Factors and Ergonomics in Manufacturing & Service Industries

    (1999)
  • M. Fenton-O’Creevy et al.

    Trading on illusions: unrealistic perceptions of control and trading performance

    Journal of Occupational and Organizational Psychology

    (2003)
  • Fenton-O’Creevy, M., Soane, E., Nicholson, N., Willman, P., 2008. Thinking, feeling and deciding: the influence of...
  • M.L. Finucane et al.

    The affect heuristic in judgments of risks and benefits

    Journal of Behavioral Decision Making

    (2000)
  • Forester, J.A., Kolaczkowski, A., Lois, E., Kelly, D., 2006. NUREG-1842: Evaluation of Human Reliability Analysis...
  • A. Franco et al.

    Problem structuring methods I

    Journal of the Operational Research Society

    (2006)
  • A. Franco et al.

    Problem structuring methods II

    Journal of the Operational Research Society

    (2007)
  • S. French et al.

    Decision Behaviour, Analysis and Support

    (2009)
  • S. French et al.

    Believe in the model: mishandle the emergency

    Journal of Homeland Security and Emergency Management

    (2005)
  • D.G. Goldstein et al.

    Models of ecological rationality: the recognition heuristic

    Psychological Review

    (2002)
  • M. Grabowski et al.

    Risk mitigation in virtual organizations

    Organization Science

    (1999)
  • Hannaman, G.W., Spurgin, A.J., Lukic, Y.D., 1984. Human Cognitive Reliability Model for PRA Analysis. Draft Report...
  • R.L. Helmreich

    On error management: lessons from aviation

    British Medical Journal

    (2000)
  • E. Hollnagel

    Human Reliability Analysis: Context and Control

    (1993)
  • E. Hollnagel

    Cognitive Reliability and Error Analysis Method – CREAM

    (1998)
  • A. Høyland et al.

    System Reliability Theory

    (1994)
  • Hrudey, S.E., Hrudey, E.J., Charrois, J.W.A., Pollard, S.J.T., 2006. A ‘Swiss Cheese’ Model Analysis of the Risk...
  • International Atomic Energy Agency, 1991. The International Chernobyl Project: Technical Report. IAEA,...
  • Jalba, D., Cromar, N., Pollard, S., Charrois, J.W.A., Bradshaw, R., Hrudey, E., in press. Safe drinking water: critical...
  • B. Kirwan

    Practical Guide to Human Reliability Assessment

    (1994)
  • Klein, G., 1993. A recognition primed decision model (RPM) of rapid decision making. In: Klein, G., Orasanu, J.,...
  • Cited by (105)

    View all citing articles on Scopus
    View full text