1 Introduction

A highway–rail grade crossing (HRGC) is a segment of the transportation network, where the highway and the rail tracks intersect at the same elevation (Fig. 1). HRGCs can be found on both public and private highways in the USA. Over the past decades, several factors, including an increase in the demand for freight and passenger transportation, urbanization, and growing road and rail network, have led to an increase in the number of HRGCs in the USA. The public and private HRGCs in the USA were in excess of 250,000 in the year of 2015 [1]. In the meantime, safety at HRGCs has been posing a major challenge over the past years [2]. At every HRGC, there is a likelihood of train–vehicle, vehicle–vehicle, and vehicle–warning device accidents. The Federal Railroad Administration (FRA) defines an HRGC accident as “any impact between a rail and highway user (both motor vehicles and other users) of the crossing at a designated crossing site, including walkways, sidewalks, etc., associated with the crossing” [3]. HRGC accidents may result in severe injuries, fatalities, and cause substantial property damage. A total of 6242 HRGC accidents involving trains and vehicles were reported in the USA from 2015 to 2017, which resulted in 895 deaths and 2745 injuries [4]. Therefore, quite a significant portion of the HRGC accidents in the USA cause fatalities. Such a pattern has been observed in the previous years as well (Table 1).

Fig. 1
figure 1

A schematic illustration of an HRGC

Table 1 Number of train–vehicle accidents, injuries, and fatalities in the USA between 2008 and 2017

HRGC accidents have received a lot of attention from the research community and state Departments of Transportation (DOTs) over the last decades [5,6,7,8,9,10]. Although accidents at HRGCs have been extensively investigated in the state-of-the-art, recent trends have shown that safety at HRGCs remains a major source of concern for the state DOTs. Severe injuries, fatalities, and substantial property damages are reported in each state of the USA every year [4]. Researchers and state DOTs have used various methods to predict accident occurrence and severity at HRGCs. Several studies have investigated different factors that may influence the occurrence of accidents at HRGCs and developed the accident and hazard prediction models for HRGCs [11, 12]. The accident prediction models rely on certain operational and physical HRGC characteristics (e.g., average daily traffic volume, train type, average daily train volume, highway vehicular speed, train speed, number of rail tracks, and type of warning devices used) as well as the accident history in order to estimate the predicted number of accidents for a specific period of time (generally, a 1-year period is considered). Unlike the accident prediction models, the hazard prediction models do not estimate the actual predicted number of accidents. The hazard prediction models assess vulnerability of HRGCs to potential accidents based on certain operational and physical HRGC characteristics as well as the accident history.

Some of the accident and hazard prediction models are nationally recognized and adopted by a number of states. On the other hand, certain states have developed their own accident and hazard prediction models for HRGCs [11, 12]. The models are used to identify the most dangerous HRGCs in each state and to prioritize them for safety improvement projects based on the available budget. Certain studies in the state-of-the-art have used customized statistical models (such as the Poisson model, the negative binomial model, the gamma model, the Conway–Maxwell–Poisson model, the Bernoulli model, and the zero-inflated Poisson model among others) to examine the relationship between accidents at HRGCs and contributing factors [6, 10, 13]. Considering a strong emphasis of public agencies on the safety aspects of transportation networks [14,15,16,17,18], the present study aims to conduct a comprehensive review of the existing accident and hazard prediction formulae. The formulae that have been used by different state DOTs in the USA to estimate the expected number of accidents or safety hazard at HRGCs and rank HRGCs for safety improvements will be of a primary interest. Furthermore, a detailed analysis will be performed to identify the common factors (accident or hazard predictors) that are considered in the existing accident and hazard prediction formulae. Last but not least, this study will discuss the reported performance of the accident and hazard prediction models as well as the implementation challenges of these models.

The rest of this paper is structured as follows: The next section provides a concise overview of the state-of-the-art and the state-of-the-practice with a primary emphasis on the state DOT studies that evaluated and/or developed certain accident and hazard prediction formulae. Section 3 categorizes the identified models into two major groups (i.e., accident and hazard prediction formulae). Section 4 analyzes the common factors (i.e., predictors) that have been considered in the existing accident and hazard prediction formulae. Section 5 focuses on the reported performance of the identified accident and hazard prediction models. Moreover, the major implementation challenges of the models are discussed in the fifth section as well. The final section provides the necessary conclusions and outlines possible future research directions.

2 Literature review

This section provides a detailed review of the research efforts, undertaken by the state DOTs to predict accident occurrence and improve safety at HRGCs. Also, a concise review of the relevant state-of-the-art studies, which evaluated various accident and hazard prediction models and/or proposed certain statistical models for identifying the factors that may cause accidents at HRGCs, is presented in this section as well.

2.1 Previous research efforts by the state DOTs

Different state DOTs have made a number of research attempts to predict the occurrence of accidents at HRGCs. Throughout the literature search, ten state DOT reports were identified. The reports were prepared by the States of Virginia (1986), Alabama (1994), Illinois (2000), Missouri (2003), Tennessee (2012), Texas (2013), Iowa (2015), Nevada (2017), Ohio (2017), and Florida (2020). This section summarizes findings from the previous research efforts, undertaken by the state DOTs.

Faghri and Demetsky [19] conducted a study for the Virginia Highway & Transportation Research Council to evaluate the existing accident and hazard prediction models based on their ability to forecast accident occurrence and estimate hazard indexes for HRGCs. A survey was conducted among 45 state DOTs to determine the strategies used for identifying hazardous HRGCs. Throughout the study, a total of 13 accident and hazard prediction models were discovered. The analysis of the survey results indicated that 32% of the states developed their own models, and 30% of the states used the US DOT Accident Prediction Formula. A set of statistical analyses were performed in the study to evaluate the top five most frequently used formulae based on their ability to rank the most hazardous HRGCs in the State of Virginia. The results confirmed that the US DOT Accident Prediction Formula performed better than other four formulae [19]. Bowman [20] performed a study to enhance the rail–highway safety program in the State of Alabama. The survey that was conducted in the study identified that the states had been primarily using the US DOT Accident Prediction Formula and the custom accident and hazard prediction formulae. Additionally, the survey responses revealed that four states were not using any accident and hazard prediction formulae. These states established a priority schedule using public complaints, feedback from railroad companies, accident history, and field inspections of HRGCs.

Elzohairy and Benekohal [11] performed a survey among 49 states in the USA to identify the techniques used by the state DOTs to rank HRGCs for safety improvement projects. The following criteria were identified as important for raking HRGCs [11]: (i) higher hazard indexes and higher number of predicted accidents; (ii) site review of vehicle types (mass transit, school buses); (iii) benefit–cost analysis; (iv) public concerns and complaints; (v) engineering judgment and HRGC geometry; (vi) sight distance; and (vii) service condition. A new model (referred to as the Illinois Hazard Index Formula) was developed for predicting potential hazard of the HRGCs in the State of Illinois. The Missouri DOT conducted a study jointly with the University of Missouri—Columbia/Rolla [21]. A number of different accident and hazard prediction models were evaluated for the HRGCs in the State of Missouri. The Illinois Hazard Index Formula was found to be the most accurate for active HRGCs. On the other hand, the California Hazard Rating Formula outperformed the other models for passive HRGCs. The Tennessee DOT and the University of Memphis conducted a study to improve the HRGC safety in the state [22]. A total of two optimization models were designed as a part of the study to assist with efficiently allocating monetary resources for safety improvement projects based on the US DOT resource allocation procedure [23].

Weissmann et al. [24] proposed a Revised Texas Priority Index for ranking public HRGCs in the State of Texas for safety improvement projects. The canonical Texas Priority Index Formula and the Revised Texas Priority Index Formula were evaluated for 9108 HRGCs, located in the State of Texas, using the 2011 accident data. The results revealed that the Revised Texas Priority Index Formula was able to identify more hazardous HRGCs as compared to the Texas Priority Index Formula. The Iowa State University and the Iowa DOT conducted a study to design a methodology for ranking the HRGCs in the State of Iowa and to effectively conduct resource allocation [25, 26]. The benefit–cost analysis has been used in the State of Iowa for allocation of the available monetary resources among the existing HRGCs. Hans et al. [26] suggested a weighted index method and a supplementary Microsoft Excel spreadsheet-based tool for ranking the HRGCs, which can serve as an addition to the existing procedures in the State of Iowa. Ryan and Mielke [27] discussed the conventional methods that are frequently used to prioritize HRGCs and proposed a new model for raking the HRGCs in Nevada. The formula accounted for the daily train volume, average daily highway traffic, near misses within the past 3 years, number of accidents within the past 5 years, existing protection, train speed, and highway speed.

Sperry et al. [28] reported that 110 accidents were recorded on average every year between 2005 and 2010 at the HRGCs in the State of Ohio. The analysis of the accident data showed that the HRGCs where accidents occurred had higher train speeds, higher train volumes, and more tracks. The study recommended that the State of Ohio should keep using the US DOT Accident Prediction Formula for ranking the HRGCs for safety improvement projects. The North Carolina Investigative Index and the Missouri Exposure Index were recommended for ranking passive HRGCs upon completion of the initial prioritization. The Florida DOT conducted a study in 2020, aiming to improve the HRGC safety in the State of Florida [12]. The Modified Texas Priority Index Formula, which unlike the canonical Texas Priority Index Formula accounted for the warning device upgrades in the accident history estimations, was found to be the most accurate in ranking the state HRGCs. Furthermore, two optimization models were developed as a part of the study to minimize the overall HRGC hazard and to minimize the overall HRGC hazard severity. A set of heuristic-based algorithms were developed to solve the optimization models. Moreover, a standalone decision support tool was designed to assist the FDOT personnel by assessing the overall hazard of the HRGCs in the State of Florida as well as resource allocation among the state HRGCs. A set of illustrative examples were presented to demonstrate applicability of the developed decision support tool.

A summary of the state DOT efforts in terms of the conducted activities is presented in Table 2. It can be observed that most of the previous state DOT studies conducted interviews or surveys among the appropriate representatives regarding the safety aspects at HRGCs, provided a detailed review of some of the existing accident and hazard prediction models, as well as evaluated some of the existing accident and hazard prediction models for the HRGCs in a particular state. However, only several studies developed new models for accident and hazard prediction at HRGCs as well as designed decision support tools that could be used to improve safety at HRGCs.

Table 2 Summary of the state DOT efforts

2.2 Other relevant studies

Similar to the state DOT efforts, some state-of-the-art studies have also evaluated various accident and hazard prediction models for HRGCs (including the US DOT Accident Prediction Formula, the Peabody–Dimmick Formula, the Coleman–Stewart Model, the NCHRP Report 50 Accident Prediction Formula, and others). Austin and Carson [5] underlined that some of the accident and hazard prediction models, including the Peabody–Dimmick Formula, the NCHRP Report 50 Accident Prediction Formula, and the New Hampshire Hazard Index Formula, lack descriptive capabilities because they consider a limited number of explanatory variables. Furthermore, the authors stated that the accuracy of the US DOT Accident Prediction Formula in predicting accidents might significantly reduce over time due to the model complexity. The study developed an alternative HRGC accident prediction model that was based on the negative binomial regression.

Oh et al. [13] used various statistical models to examine some of the factors that cause accidents at HRGCs. A total of four statistical models, including the gamma probability model, the Poisson model, the negative binomial model, and the zero-inflated Poisson model, were considered. Also, four nationally recognized HRGC accident and hazard prediction models were evaluated, namely the Peabody–Dimmick Formula, the New Hampshire Hazard Index Formula, the NCHRP Report 50 Accident Prediction Formula, and the US DOT Accident Prediction Formula. The HRGC accident data, obtained for 162 HRGCs in Korea, were used to evaluate the models. The results from the analysis of the models revealed that the gamma probability model fit the data best. However, the authors pointed out that the US DOT Accident Prediction Formula was more comprehensive and descriptive in terms of the number of predictors considered as compared to the other accident and hazard prediction models. Chadwick et al. [8] performed a comprehensive review of the literature to identify the challenges of shared high-speed passenger and freight railroads. A number of accident and hazard prediction models were described, including the Peabody–Dimmick Formula, the New Hampshire Hazard Index Formula, the NCHRP Report 50 Accident Prediction Formula, the US DOT Accident Prediction Formula, and the Transport Canada Accident Model. Other accident and hazard prediction models were mentioned in the study as well, including the Poisson regression model, the negative binomial regression model, the gamma probability model, and the Bayesian model. The US DOT Accident Prediction Formula was reported to be the most commonly used model by the state DOTs to rank HRGCs.

Some studies specifically focused on the development of custom statistical models for safety improvement at HRGCs. For example, Hu et al. [6] proposed a generalized logit model with stepwise variable selection to analyze accident severity at the HRGCs in Taiwan. Highway separation, number of daily trains, obstacle detection device, number of daily trucks, and presence of HRGC markings were found to be statistically significant factors that influenced accident severity. Yan et al. [7] applied a hierarchical tree-based regression model for predicting the train-vehicle accidents at passive HRGCs. The FRA accident history database was used in the analysis for a time period between 1980 and 2006. The study results highlighted that stop signs could be considered as an effective countermeasure for passive HRGCs. Hao and Daniel [29] developed the ordered probit model to study the driver injury severity at the HRGCs with different types of warning devices (active vs. passive) in the USA. It was found that visibility, train speed, motor vehicle speed, driver’s age, traffic volume, area type, and highway pavement could significantly affect the injury severity for both active and passive HRGC types in the USA. A similar study was conducted by Hao et al. [30], which aimed to investigate the driver injury severity levels for the truck and non-truck involved accidents at the HRGCs in the USA. The study concluded that speed control for truck drivers could substantially decrease the injury severity of truck drivers.

Lu and Tolliver [10] evaluated a wide range of different statistical models for accident prediction at HRGCs, including the Poisson model, the negative binomial model, the Conway–Maxwell–Poisson model, the gamma model, the Bernoulli model, the Poisson–Hurdle model, and zero-inflated Poisson model. The analysis, conducted for the HRGCs in North Dakota, demonstrated that the Poisson model was the most promising one. Hao et al. [31] investigated the effects of foggy conditions on injury levels for the HRGCs in the USA. It was found that older drivers typically suffered severe injuries in foggy conditions primarily due to longer reaction times and a decline in vision. Furthermore, the study highlighted that some drivers were still driving at fairly high speeds when approaching to the HRGCs despite foggy conditions.

Khan et al. [32] conducted a study for the HRGCs in North Dakota. A binary logistic regression model was developed to determine the factors that influence accident occurrence at the HRGCs. The study results showed that the accident likelihood was substantially affected by the number of daily trains, maximum speed of trains, number of through railroad tracks, and number of highway lanes. Moreover, the presence of pavement markings and the presence of populations within five miles of the HRGCs also could influence accident occurrence at the considered HRGCs. Some other studies focused on improving safety of roadway travelers at HRGCs in other countries as well, including Canada [33, 34], Great Britain [35, 36], Hungary [37], Finland [38], France [39, 40], and Australia [41,42,43].

2.3 Literature summary and contribution

A detailed review of the literature revealed that safety issues at HRGCs have received significant attention from the research community over the last decades. A number of state DOTs conducted various research projects, which aimed to reduce accident occurrence at HRGCs. Several studies have developed a variety of formulae to predict the number of accidents or estimate a hazard index that was further used in ranking HRGCs for safety improvement projects. It was found that many state DOTs have been using the following accident and hazard prediction formulae [11, 12, 20]: (a) the US DOT Accident Prediction Formula; (b) the New Hampshire Hazard Index Formula; (c) the NCHRP Report 50 Accident Prediction Formula; and (d) the Peabody–Dimmick Formula. Certain state DOTs developed accident and hazard prediction models that are customized for their state needs and compared the performance of those models with the existing models that are nationally recognized.

Furthermore, some scientific studies developed alternative statistical models (such as the Poisson model, the negative binomial model, the Conway–Maxwell–Poisson model, the gamma model, the Bernoulli model, and the zero-inflated Poisson model among others) to evaluate the relationship between accidents at HRGCs and contributing factors [6, 10, 13, 29,30,31,32]. Typically, these statistical models are based on the same predictors as the ones that are adopted for the accident and hazard prediction formulae that have been used by the state DOTs. However, such statistical models are not generally used by the state DOTs for accident and hazard prediction at HRGCs due to their complexity. The main contributions of this study to the state-of-the-art and the state-of-the-practice include the following:

  • Conduct a detailed up-to-date review of the nationally recognized accident and hazard prediction formulae that have been used by the state DOTs to rank the HRGCs in the USA for safety improvement projects.

  • Conduct a detailed up-to-date review of the state-specific accident and hazard prediction formulae that have been used by the state DOTs to rank the HRGCs in the USA for safety improvement projects.

  • Perform a comprehensive analysis of the major factors that are being considered in the existing accident and hazard prediction formulae.

  • Discuss the reported performance of the identified accident and hazard prediction formulae as well as the reported implementation challenges.

  • Suggest future research directions based on the existing challenges that are associated with implementation of the existing accident and hazard prediction formulae.

3 Categorization of the identified models

A detailed review of the relevant studies identified a total of 21 accident and hazard prediction formulae. The discovered formulae were classified into the following two major groups: (i) accident prediction formulae that can be used to calculate the expected number of accidents at HRGCs for a particular time period and (ii) hazard prediction formulae that can be used to assess hazard or a safety index value for a given HRGC, which can be further considered throughout the safety improvement project selection. A distribution of the identified accident and hazard prediction formulae is presented in Fig. 2. Approximately 71% of the formulae (15 formulae out of 21 formulae) are the hazard prediction formulae, while the remaining 29% of the formulae (6 formulae out of 21 formulae) can be classified as the accident prediction formulae. A higher percentage of the hazard prediction formulae among the identified formulae can be explained by the fact that it is more difficult to forecast the expected number of accidents at HRGCs with a high degree of accuracy due to a large variety of factors (including human behavior that is difficult to emulate). Thus, the majority of the identified formulae aim to calculate the hazard index for HRGCs instead of the predicted number of accidents.

Fig. 2
figure 2

A distribution of the identified accident and hazard prediction formulae

The following accident prediction formulae were discovered after a detailed state-of-the-practice review:

  1. (1)

    The Coleman–Stewart Model

  2. (2)

    The Iowa Accident Prediction Formula

  3. (3)

    The Jaqua Formula (developed by the State of Oregon)

  4. (4)

    The NCHRP Report 50 Accident Prediction Formula

  5. (5)

    The Peabody–Dimmick Formula

  6. (6)

    The US DOT Accident Prediction Formula

The following hazard prediction formulae were discovered after a detailed state-of-the-practice review:

  1. (1)

    The Arkansas Hazard Rating Formula

  2. (2)

    The California Hazard Rating Formula

  3. (3)

    The Connecticut Hazard Rating Formula

  4. (4)

    The Florida Accident Prediction and Safety Index Formula

  5. (5)

    The Illinois Hazard Index Formula

  6. (6)

    The Kansas Design Hazard Rating Formula

  7. (7)

    The Michigan Hazard Index Formula

  8. (8)

    The Missouri Exposure Index Formula

  9. (9)

    The Nevada Hazard Index Formula

  10. (10)

    The New Hampshire Hazard Index Formula

  11. (11)

    The New Mexico Hazard Index Formula

  12. (12)

    The North Carolina Investigative Index Formula

  13. (13)

    The Revised Texas Priority Index Formula

  14. (14)

    The South Dakota Hazard Index Formula

  15. (15)

    The Texas Priority Index Formula

Certain states, including Alaska, North Dakota, and Washington, have been mostly utilizing customized spreadsheets without application of any particular accident or hazard prediction formulae to rank HRGCs for safety improvement projects. A number of hazard prediction formulae, such as the Connecticut Hazard Rating Formula, the Texas Priority Index Formula, and the New Mexico Hazard Index Formula, are inspired by the nationally recognized New Hampshire Hazard Index Formula. Certain state DOTs modified the canonical New Hampshire Hazard Index Formula and included some additional factors in the hazard prediction to improve the formula accuracy (such as train speed, type of train, sight distance, vehicle speed, highway–rail grade crossing angle, number of school buses, presence of the nearby intersection, and number of accidents among others). Moreover, the US DOT Accident Prediction Formula has been used by many states over the years (e.g., Alabama, Idaho, Indiana, Maine, Maryland, Ohio, South Carolina, Utah, Virginia, and Wisconsin).

A detailed description and formulation of the identified nationally recognized accident and hazard prediction models is presented in the following sections of this paper. The nationally recognized accident and hazard prediction models include the Coleman–Stewart Model, the NCHRP Report 50 Accident Prediction Formula, the New Hampshire Hazard Index Formula, the Peabody–Dimmick Formula, and the US DOT Accident Prediction Formula [8, 27]. Moreover, the other accident and hazard prediction models and resource allocation procedures used by the state DOTs are comprehensively described in “Appendix” of this paper. Note that the description of the accident and hazard prediction formulae was prepared using the available HRGC safety literature and the reviewed studies [11, 12, 19,20,21,22,23,24,25,26,27,28, 44,45,46].

3.1 Coleman–Stewart Model

The Coleman–Stewart Accident Prediction Model assesses the relationship between the observed accident rates and some factors that may cause accidents at HRGCs (e.g., daily train movements, daily vehicular movements, and existing protection). Moreover, the model assumes that HRGCs are identical if they have similar characteristics, including the warning device type, location, number of tracks, and highway and traffic volumes. The model can be expressed as follows [11, 45]:

$${\text{lg}}\, A = B_{0} + B_{1} {\text{lg}}\, C + B_{2} {\text{lg}}\, T_{\text{cs}} + B_{3} \left( {\text{lg}}\, T_{\text{cs}} \right)^{2} ,$$
(1)

where \(A\) is the average number of accidents per HRGC per year; \(C\) are the average daily vehicular movements (if \(C\) = 0, use 0.5 instead); \(T_{\text{cs}}\) are the average daily train movements (if \(T_{\text{cs}}\) = 0, use 0.5 instead); \(B_{0}\), \(B_{1}\), \(B_{2}\), and \(B_{3}\) are the coefficients of the accident prediction equation (Table 3).

Table 3 Coleman–Stewart Model coefficients

Although the variation among a collection of HRGCs in a given location may affect accident prediction at individual HRGCs, the Coleman–Stewart Model does not account for such a variation.

3.2 NCHRP Report 50 Accident Prediction Formula

The National Cooperative Highway Research Program (NCHRP) Report 50 proposed a formula for predicting accidents at HRGCs. The formula considers a number of factors, including the number of trains per day, existing warning devices, number of highway vehicles per day, and location (urban/rural). The NCHRP Report 50 Accident Prediction Formula can be expressed as follows [8, 11, 23, 27, 45]:

$$N = A \times B \times T_{\text{nc}},$$
(2)

where \(N\) is the number of accidents per year; \(A\) is the factor based on the number of highway vehicles per day; \(B\) is the factor based on the existing warning devices and urban/rural classification; \(T_{\text{nc}}\) is the current train volume per day.

The values of A and B can be found in Table 4 and Fig. 3 [23, 45].

Table 4 A values based on 10-year annual average daily traffic (AADT)
Fig. 3
figure 3

B values based on the existing protection and location classification

3.3 New Hampshire Hazard Index Formula

The New Hampshire Hazard Index Formula presents a fairly simple approach for estimating the hazard indexes at HRGCs. The hazard indexes are further used to rank HRGCs based on the probability of accidents. The formula states that the hazard index is dependent on the average daily volume of trains, average daily volume of vehicles, and type of warning devices installed at a given HRGC. The New Hampshire Hazard Index Formula can be expressed as follows [8, 27, 45]:

$$NHHI = V \times T_{\text{nh}} \times PF,$$
(3)

where \(NHHI\) denotes the New Hampshire Hazard Index; \(V\) is the annual average daily traffic; \(T_{\text{nh}}\) is the average daily volume of trains; \(PF\) is the protection factor (1.00 for stop signs, 0.60 for flashing lights, and 0.10 for gates).

Although a number of state DOTs have used the canonical New Hampshire Hazard Index Formula, certain states have customized the formula with the introduction of some variables in order to improve the accuracy of the formula in predicting safety hazard at HRGCs. Some of the additional variables include the train speed, sight distance, vehicle speed, HRGC width, HRGC angle, type of train, population, surface type, number of school buses, number of buses, surface condition, number of tracks, presence of a nearby intersection, vertical alignment, functional class of highway, horizontal alignment, number of hazardous material trucks, number of accidents, number of passengers, etc. [23].

3.4 Peabody–Dimmick Formula

The Peabody–Dimmick Formula was developed by the US Bureau of Public Roads using the data obtained from 3563 rural HRGCs in 29 states. The formula computes the expected number of accidents for 5 years based on some factors, including the average daily train traffic, annual average daily traffic, and protection coefficient (dependent on the warning device type) using the following equation [8, 23, 27, 45]:

$$A_{5} = K + \frac{{1.28V^{0.170} \times T^{0.151}_{\text{pd}} }}{{P^{0.171} }},$$
(4)

where \(A_{5}\) is the expected number of accidents in 5 years; \(V\) is the annual average daily traffic factor; \(T_{\text{pd}}\) is the average daily train traffic factor; \(P\) is the protection coefficient; \(K\) is the additional parameter.

In order to estimate the additional parameter \(K\), an unbalanced accident factor (\(l_{\mathrm{u}}\)) is required that can be calculated as follows [23]:

$$l_{\mathrm{u}} = 1.28\frac{{V^{\mathrm{a}} \times T^{\mathrm{b}} }}{{P^{\mathrm{c}} }},$$
(5)

where \(V^{\mathrm{a}}\), \(T^{\mathrm{b}}\) and \(P^{\mathrm{c}}\) are the accident factors (i.e., the annual average daily traffic factor, the average daily train traffic factor, and the protection coefficient).

The expected number of accidents in 5 years (\(A_{5}\)) can be identified using a set of charts, which are presented in Figs. 4, 5, 6 and 7.

Fig. 4
figure 4

Relationship between highway traffic and accident factor Va

Fig. 5
figure 5

Relationship between railroad traffic and accident factor Tb

Fig. 6
figure 6

Relationship between warning device type and accident factor Pc

Fig. 7
figure 7

Relationship between K-factor and unbalanced accident factor lu

3.5 US DOT procedure for accident prediction and resource allocation

The expected number of accidents at an HRGC over a defined time period can be forecasted using an accident prediction model. Generally, accident prediction models take into account operational and physical characteristics of HRGCs to forecast accident occurrence. The US DOT developed an accident prediction model that would be able to assist the states with satisfying certain requirements under the Federal-Aid Policy Guidelines (FAPG) [44]. The US DOT Model uses a total of three equations to predict accidents at HRGCs. The first equation, which is referred to as the initial accident prediction formula, predicts the accident frequency per year at HRGCs using their operational and physical characteristics. The initial accident prediction formula is presented as follows [8, 21, 23, 27, 45]:

$$a = K \times EI \times MT \times DT \times HP \times MS \times HT \times HL,$$
(6)

where \(a\) denotes the initial accident prediction, accidents per year at an HRGC; \(K\) is the formula constant; \(EI\) is the factor for exposure index based on the product of highway and train traffic; \(MT\) is the factor for the number of main tracks; \(DT\) is the factor for the number of through trains per day during daylight; \(HP\) is the factor for highway paved (yes or no); \(MS\) is the factor for maximum timetable speed; \(HT\) is the factor for highway type; \(HL\) is the factor for the number of highway lanes.

The values for the factors of the initial accident prediction formula can be found in Table 5. The second accident prediction formula captures the accident history recorded for a given HRGC over a defined time period. A major assumption in the formula is that the future accidents are expected to occur at the same rate as the past accidents. Hence, the formula is expected to give the most accurate results when the available accident history of up to 5 years is considered. However, the accident data, collected for the time period that exceeds 5 years, may produce misleading results, as substantial changes in operational and physical HRGC characteristics may occur over such a time period. The second accident prediction can be estimated as follows [8, 21, 23, 27, 45]:

$$B = \frac{{T_{0} }}{{T_{0} + T_{\text{y}}}}(a) + \frac{{T_{0} }}{{T_{0} + T_{\text{y}}}}\left( {\frac{N}{T_{\text{y}}}} \right),$$
(7)

where \(B\) denotes the second accident prediction (measured in accidents per year at an HRGC); \(a\) is the initial accident prediction (measured in accidents per year at an HRGC); \(\frac{N}{T}\) is the accident history prediction (measured in accidents per year at an HRGC), and \(N\) is the number of accidents that were observed in \(T_{\text{y}}\) years at an HRGC; \(T_{0}\) is the formula weighting factor, \(T_{0}\) = \(\frac{1}{0.05 + a}\).

Table 5 Highway–rail grade crossing characteristic factors for the Initial US DOT Accident Prediction Formula

The final accident prediction is further estimated by applying a normalizing constant (\(A\)) to the second accident prediction for each HRGC category. The HRGC categories are differentiated based on the protection type into the following groups: (a) passive HRGCs, (b) HRGCs with flashing lights, and (c) HRGCs with gates. Application of normalizing constants ensures that the procedure is transformed from the past accident trends to the current accident trends. Thus, the normalizing constants have to be updated regularly. The normalizing constant can be estimated by equating the sum of the predicted accidents multiplied by the corresponding normalizing constant to the most recent number of accidents for each HRGC category over a specific time period [23, 46]. The final accident prediction can be used afterward to rank HRGCs for resource allocation. The periodic updates used for accident prediction as well as the normalizing constants for resource allocation procedure are presented in Table 6.

Table 6 Values of the normalizing constants for the accident prediction and resource allocation procedure

4 Key factors considered in the identified models

Table 7 presents the list of the factors (or predictors) considered by the identified accident and hazard prediction formulae. A detailed review of the 21 accident and hazard prediction formulae, which were identified in the literature, revealed a total of 20 unique predictors that may directly influence the expected number of accidents at HRGCs. A distribution of the factors considered by the identified accident and hazard prediction formulae is presented in Fig. 8. The analysis results indicate that all the identified accident and hazard prediction formulae directly account for the number of vehicles per day and the number of trains per day. Such a finding can be supported by the fact that the number of vehicles per day and the number of trains per day are the key factors that determine exposure of a given HRGC to highway and railway traffic. A total of 19 accident and hazard prediction formulae consider the protection type at HRGCs (i.e., type of warning devices used). Some other predictors that are commonly used by the identified accident and hazard prediction formulae include: (i) accident history—captured in 12 formulae; (ii) train speed—captured in 11 formulae; (iii) number of tracks—captured in 10 formulae; (iv) sight distance—captured in 8 formulae; and (v) number of traffic lanes—captured in 6 formulae.

Table 7 Factors considered by the identified accident and hazard prediction formulae
Fig. 8
figure 8

A distribution of the factors considered by the identified accident and hazard prediction formulae

Certain predictors have been captured only by a very limited number of accident and hazard prediction formulae. For instance, only the Jaqua Formula, which has been deployed by the Oregon DOT, directly accounts for the number of cars in a train, approach gradient, and different roadway geometric features (e.g., existence of entrances and exits to the streets as well as the street intersections near an HRGC, and curvature of the roadway). Furthermore, the Florida Accident Prediction and Safety Index Formula and the North Carolina Investigative Index Formula are the only hazard index formulae that capture the number of school buses, which traverse a given HRGC on a daily basis. The Revised Texas Priority Index Formula is the only hazard index formula that explicitly considers presence of a nearby highway intersection throughout estimation of a hazard index for a given HRGC. Consideration of the additional predictors (i.e., number of cars in a train, approach gradient, different roadway geometric features, number of school buses, presence of a nearby highway intersection) is expected to improve the performance of the accident and hazard prediction formulae. However, the existing HRGC inventory databases may not have any information regarding the additional predictors, and supplementary field reviews will have to be conducted in order to collect the required data.

5 Performance of the identified models and implementation challenges reported

Many of the reviewed studies investigated the performance of the existing accident and hazard prediction models. A review of some accident and hazard prediction models, which was conducted by Chadwick et al. [8], revealed that the US DOT Accident Prediction Formula is the most commonly used model by the state DOTs for prioritizing HRGCs for safety improvements. Moreover, the research efforts that were conducted by the States of Alabama, Virginia, and Ohio concluded that the US DOT Accident Prediction Formula outperformed the alternative accident and hazard prediction formulae for the HRGCs in the respective states and returned more accurate ranking of the HRGCs [19, 20, 28]. The Illinois Hazard Index Formula was found to be the most accurate formula for ranking the HRGCs in the State of Illinois [11]. Moreover, the Illinois Hazard Index Formula was found to be the most accurate for active HRGCs in the State of Missouri, while the California Hazard Rating Formula outperformed the other models for passive HRGCs [21]. The original Texas Priority Index Formula was outperformed by the Revised Texas Priority Index Formula when ranking the HRGCs in the State of Texas [24].

Austin and Carson [5] highlighted that some accident and hazard prediction formulae (including the New Hampshire Hazard Index Formula, the NCHRP Report 50 Accident Prediction Formula, and the Peabody–Dimmick Formula) lack descriptive capabilities because they consider only a limited number of explanatory variables. For instance, the Texas Priority Index, which is an extension of the New Hampshire Hazard Index Formula, does not capture certain important predictors, such as transport of hazardous materials, school bus usage, train speeds, and urban/rural distinction [27]. Furthermore, another issue with the Texas Priority Index consists in the fact that it does not allow differentiating between two HRGCs with the same operational and physical features, which have one accident and no accidents over the past 5 years, respectively. In particular, both HRGCs will have exactly the same values of the Texas Priority Index despite substantial differences in the past accident history [27].

On the other hand, some other formulae (e.g., the US DOT Accident Prediction Formula) have a significant number of explanatory variables. However, there is a need to update the parameters in the US DOT Accident Prediction Formula to prevent a decline in accuracy of the model [5]. The normalizing constant values that are used in the US DOT Accident Prediction Formula significantly affect its accuracy and have to be adjusted over time. Moreover, the survey conducted by Bowman [20], which involved the state highway–rail program coordinators across the USA, revealed that the US DOT Accident Prediction Formula still fails to account for some important factors (such as roadway approach characteristics and quadrant sight distance) and is significantly dependent on the accident history. Several studies pointed out that the data availability is one of the major challenges that affect a successful implementation of accident and hazard prediction formulae [11, 20, 21, 28]. Certain predictors may not be available in the existing FRA and state HRGC databases (e.g., daily average train movements by type of train, number of school bus passengers, number of blind quadrants, and speed of each type of train). Field reviews will be required in order to collect the necessary data for implementation of certain accident and hazard prediction formulae. Moreover, the state DOT representatives may have to contact railroad companies directly to obtain specific data.

Some accident and hazard prediction formulae (e.g., the US DOT Accident Prediction Formula, the California Hazard Rating Formula, and the New Hampshire Hazard Index Formula) generally give a higher priority value to the HRGCs with high exposure and may not accurately capture vulnerability of the HRGCs with low exposure [24, 27]. Aiming to address the latter challenges, Ryan and Mielke [27] proposed a new hazard index formula (named as “the Nevada Hazard Index Formula”) that accounted for several factors, including the daily train volume, average daily highway traffic, number of accidents within the past 5 years, protection factor, highway speed factor, number of near misses within the past 3 years, rail speed factor, HRGC angle factor, and track configuration factor. The Nevada Hazard Index Formula uses the square root of exposure to prevent the assignment of “too much weight” to the train and roadway traffic volumes. Another important issue that was pointed out by some of the studies consists in the fact that AADT and train counts may not be updated regularly by certain state DOTs [20, 26, 28]. The latter can further result in errors throughout accident and hazard prediction for HRGCs as well as misleading resource allocation.

Table 8 summarizes the implementation challenges that were identified throughout a detailed review of the state DOT efforts and proposes potential solutions to address these challenges. The issue of insufficient descriptive capabilities can be addressed by introducing additional factors in the accident and hazard prediction formulae, including train speed, sight distance, vehicle speed, HRGC width, HRGC angle, type of train, surface type, and number of school buses [23, 27]. A periodic calibration of the model parameters is required to prevent a decline in their accuracy that may occur over time [5]. The issue of overreliance of certain models on the accident history can be addressed by considering the number of near misses in the models along with the number of past accidents [27]. Furthermore, the number of warrants met should be accounted for when ranking passive HRGCs [24]. Periodic field reviews are required in order to resolve the issue of the data unavailability. In certain cases, the state DOT representatives may have to contact railroad companies or other relevant stakeholders directly to obtain specific data. The use of the square root of exposure rather than the base exposure value can effectively resolve the issue of assigning “too much weight” to the train and roadway traffic volumes [27]. Periodic reviews of the HRGC databases based on the feedback from the appropriate stakeholders are required to resolve the data inconsistency issues. Furthermore, the information regarding operational and physical HRGC features should be updated in the HRGC databases with a sufficient frequency to ensure accurate accident and hazard prediction.

Table 8 Implementation challenges identified and potential solutions proposed

6 Concluding remarks and future research needs

Highway–rail grade crossings (HRGCs) are considered as one of the most dangerous segments of the transportation network. At every HRGC, there is a risk of an accident between a highway vehicle and a train. Other types of accidents, such as rear-end accidents between highway vehicles and collisions of vehicles with warning devices, may also occur at HRGCs. Although there has been a decline in the number of accidents at HRGCs over the past years, thousands of accidents are still recorded annually in the USA, which result in fatalities, injuries, and staggering costs in property damages. Numerous studies have investigated the factors that may influence accident occurrence and developed various models to estimate the expected number of accidents at HRGCs as well as to assess the HRGC hazard. This paper presented a comprehensive review of the existing accident and hazard prediction formulae that have been used by the state Departments of Transportation (DOTs) over the years to prioritize the HRGCs for safety improvement projects in the USA. From an extensive literature review, a total of 21 accident and hazard prediction formulae were identified, five of which are nationally recognized accident and hazard prediction formulae. Furthermore, a number of customized accident and hazard prediction models, developed by certain state DOTs, were identified and evaluated as well.

The identified accident and hazard prediction formulae were divided into two categories based on the measure used in ranking HRGCs, which include: (i) accident prediction formulae (6 out of 21 formulae) that can be used to calculate the expected number of accidents at HRGCs for a particular time period; and (ii) hazard prediction formulae (15 out of 21 formulae) that can be used to assess hazard or a safety index value for a given HRGC, which can be further considered throughout the safety improvement project selection. A higher percentage of the hazard prediction formulae among the identified formulae can be explained by the fact that it is more difficult to forecast the expected number of accidents at HRGCs with a high degree of accuracy. A detailed analysis of the identified accident and hazard prediction formulae indicated that all the formulae directly accounted for the number of vehicles per day and the number of trains per day that determine exposure of a given HRGC to highway and railway traffic. A total of 19 accident and hazard prediction formulae considered the protection type at HRGCs (i.e., type of warning devices used).

Furthermore, this study discussed the reported performance and implementation issues, associated with the identified accident and hazard prediction formulae. The US DOT Accident Prediction Formula was found to be the most preferential among the state DOTs for prioritizing HRGCs for safety improvements. The key issues, associated with implementation of the identified accident and hazard prediction formulae, include: (a) limited number of explanatory variables in certain formulae (e.g., the NCHRP Report 50 Accident Prediction Formula, the New Hampshire Hazard Index Formula, and the Peabody–Dimmick Formula) causing a lack of descriptive capabilities; (b) decline in accuracy of the model constants (e.g., the US DOT Accident Prediction Formula normalizing constants have to be adjusted over time); (c) overreliance of certain accident and hazard prediction formulae on the accident history; (d) unavailability of the data required to apply the models (such as sight distance, number of blind quadrants, angle of the intersection between the roadway and the track); (e) assignment of “too much weight” to the train and roadway traffic volumes (exposure value), which results in higher priority values for the HRGCs with higher traffic volumes as compared to the lower-volume HRGCs; and (f) inconsistencies and outdated information in the HRGC databases maintained by some state DOTs.

Future research can be extended in the following directions: (i) there is a need to evaluate the effects of improvement in technology on the performance of existing accident and hazard prediction models; (ii) there is a need for future studies to investigate the effects of driver behavior on accident occurrence at HRGCs; (iii) future studies may include additional driver and roadway geometric characteristics into the existing accident and hazard prediction models (for instance, the effects of driver fatigue, level of frustration, time of accident, age of driver, population distribution, etc. may directly influence accident occurrence at HRGCs); (iv) there is a need for practitioners to develop new procedures for creating HRGC databases that will address the existing data availability and inconsistency issues; (v) conduct a survey among the state DOT representatives to identify any changes in the accident and hazard prediction procedures that have occurred over the years; and (vi) compare the identified accident and hazard prediction formulae against customized statistical models (e.g., the Poisson model, the negative binomial model, the gamma model) for a group of selected HRGCs in the US and other countries as well.