Link travel time inference using entry/exit information of trips on a network
Introduction
Travel time is one of the most important factors when a traveler plans a route from an origin to a destination, and is also critical to transportation planners and operators as a performance measure. Accurate travel time estimation on a transportation network is therefore becoming an essential task and is made possible now by widely available traffic data.
Travel time data on a network is regularly obtained from traffic tracking such as through probing phones (Bar-Gera, 2007, Ygnace et al., 2000), global positioning system (GPS) devices (Bertini and Tantiyanugulchai, 2004), and vehicle ID readers (through Bluetooth or vehicle plate identification and matching, e.g., Haghani et al., 2010, Chang et al., 2004). While the associated methods to estimate roadway travel time range from regression models (Chan et al., 2009), machine learning approaches (Zheng and Zuylen, 2013), and to analytical models dealing with traffic conditions (Hellinga et al., 2008), many required parameters limit their applicability in practice and there is a lack of general model approach to the network-wide travel time estimation problem.
Valid statistical analysis becomes increasingly important as the data becomes widely available (Fan et al., 2014). In what follows, we will briefly review two categories of statistical approaches in relevant literature: the traditional maximum-likelihood method and the Bayesian approach.
Among the scant literature that focuses directly on this topic, Hunter et al. (2009) formulate a maximum-likelihood problem to estimate link travel time distributions on an arterial network. Their model considers the observations of unknown trajectories. They present an Expectation–Maximization (EM) algorithm to simultaneously learn the likely paths of probe vehicles as well as the travel time distributions on the network. They assume that travel times on different links are independent and briefly conduct numerical tests using San Francisco taxi data. Instead of the assumption of independent link travel times, Jenelius and Koutsopoulos (2013) present a statistical model for travel time estimation on an urban road network considering the correlation between travel times on different links. They capture the correlation using a moving average specification for link travel times. The specific information of link attributes (such as speed limit and roadway functional class) and trip conditions (such as day-of-week, time-of-day, and weather condition) are incorporated as explanatory variables. The model is estimated using maximum-likelihood method, and is applied to a particular route on the Stockholm network in Sweden.
In contrast to the traditional maximum-likelihood method, some studies apply the Bayesian approach to travel time distribution prediction. Hofleitner et al., 2012a, Hofleitner et al., 2012b propose a dynamic Bayesian network for unobserved traffic conditions and model link travel time distributions conditional on traffic state. Their method is from the conventional traffic flow perspectives, and is applied to a San Francisco road network to predict travel times using taxi data. Westgate et al. (2013) also propose a Bayesian model to estimate the distribution of ambulance travel times on road segments in Toronto. They apply a multinomial Logit model to formulate the path choices for ambulance trips, and perform the path inference and travel time estimation simultaneously using a Bayesian approach. They also assume that link travel times are independent and log-normally distributed. The parameters are estimated using Markov Chain Monte Carlo (MCMC) methods. Instead of modeling travel times at the link level in the previous work, Westgate et al. (2013) model ambulance travel times at the trip level. They propose a regression approach for estimating the ambulance travel time distribution on an arbitrary route, and use a Bayesian formulation to estimate the model parameters. The advantage of applying the Bayesian approach is that it utilizes expert knowledge as prior information, and tackles many complicated problems that traditional statistical approaches find difficult to analyze. However, its implementation relies on computationally expensive methods such as the MCMC.
This research aims to develop inference methods for link travel time estimation on a steady state network, assuming that each link is associated with a random travel time due to different traveling vehicles and variation of day time. We believe the statistical characteristics of the link travel time as well as the distribution can be measured approximately. We estimate network-wide link travel times by only using vehicle start and end locations and time of trips, referred to as traveler entry/exit time stamps in this paper. This type of data is representative of the data available when discrete points of a trip are recorded. Sparse vehicle trajectories reported by GPS-equipped probe vehicles or smart phones (Wang et al., 2014) can also be regarded as a particular case of traveler entry/exit trip information on a network. By trip we mean sequential records that denote the time stamps for start and end nodes on a roadway network. For example, two consecutive GPS records are considered as a trip in this study, though they essentially represent a trip segment. Specially, this research is motivated by a practical application on a toll road network, in which traveler entry/exit time stamps are recorded at tollbooths and the toll road authority has a practical need to use the travel time inference results to evaluate the toll systems. Other potential applications include using public transit data for network performance analysis when passenger entry/exit information is recorded at fare boxes (Ma et al., 2013).
We start with the assumption of independent and normally distributed link travel times, and present the EM algorithm to address the trips with unknown paths, as in Hunter et al., 2009, Siripirote et al., 2013. However, our study differs from this earlier work by focusing on exploring the analytical properties of the proposed methods. We examine the impact of errors in trip variance estimates on mean link travel time estimates, and investigate the uniqueness of solutions in the algorithm. We also provide confidence intervals for mean link times.
Furthermore, we propose a statistical method of trip splitting approximation, as Method II in this paper, to mainly address a technical situation in which the summation of random link travel times for a path does not have a closed-form probability distribution. We assume that trips on the same path under similar traffic conditions (e.g., different vehicles are measured within a time interval of 30 min or so) have approximately a constant proportion of the trip time for a traversed link. Similar idea of decomposing trip travel time has already been proposed in practical applications (Hellinga et al., 2008), but without appropriate justification and investigation. The proposed trip splitting approximation method may be applied to arbitrary distributions, and is statistically justified for the network estimation problem. Our numerical tests indicate that this method is computationally cheaper, though it may lead to relatively larger errors for link time standard deviation estimates. Its potential application would be more promising if more traffic information is available.
The remainder of this paper is organized as follows. Section 2 proposes the first method assuming that the trip time has a closed-form distribution, using Gaussian distribution for link travel time as an example. Two cases are discussed in detail: one with known paths of all trips and the other with unknown paths of some trips. Section 3 develops a statistical framework of the trip splitting method. Section 4 tests the proposed methods with simulated data on a simple nine-link test network and the Sioux Falls network respectively. Section 5 discusses the advantages and disadvantages of both methods. Section 6 concludes the paper.
Section snippets
Method I: Estimation using trip time distributions
In this study, a roadway network is represented by a graph where links (edges) stand for roadway segments and nodes represent connections of links (e.g., intersections or junctions). Each link is associated with a random travel time that follows a certain type of distribution. A path is defined as an alternating sequence of links and nodes from an origin to a destination node (known as an OD pair). Each trip consists of a path, the entry (starting) time at the origin, and the exit (ending) time
Method II: Trip splitting approximation
In our earlier models, Gaussian distribution of link travel times gives rise to a trip time that follows a closed form distribution, which makes modeling technically tractable. However, the link time may follow other probability distributions than the Gaussian such as the log-normal, or a mixed distribution due to the recurrent traffic congestion, in which case, no closed form distribution for trip time is available. We propose to split the trip time among traversed links. Different approaches
Experimental results
We numerically test the proposed models and procedures on networks with results being organized in three subsections. First we test the EM algorithm of Method I to find the individual algorithm efficiency, followed by testing the trip splitting approximation method for the log-normal distributed link travel times. We also compare the estimates from using both Method I and Method II with link times following Gaussian distributions. Two networks are used for the tests: a simple network with nine
Discussion of the two methods
The two proposed methods are based on the additive property of link time distributions, i.e., whether the summation of link travel times has a closed-form distribution. Starting from the likelihood principle, both methods, whenever necessary, decompose the link travel time inference into structural steps that share the same spirit of the EM machinery. The key strategy is the introduction of the augmented data (or complete data), namely augmenting the observed data with hidden (unobserved)
Conclusions
Link travel time estimation on a roadway network is essential for performance assessment in order to improve traffic mobility and network efficiency. It is made possible now by the widely available traffic data. This paper develops two model frameworks based on statistical inference methods for link travel time estimation using entry/exit information of trips on a network.
First, we propose a method using independent Gaussian link travel times. We particularly analyze the property of mean
Acknowledgements
The authors are grateful to the two referees for their insightful comments and suggestions. The authors would gratefully acknowledge the kind support from the National Center for Freight and Infrastructure Research and Education (CFIRE) at the University of Wisconsin-Madison and the Southwest Region University Transportation Center (SWUTC) at Texas A&M University. Dr. Teresa Qu at Texas A&M University Transportation Institute initially brought a practical problem for discussion that motivated
References (35)
Evaluation of a cellular phone-based system for measurements of traffic speeds and travel times: a case study from israel
Transportation Research Part C: Emerging Technologies
(2007)- et al.
Decomposing travel times measured by probe-based traffic monitoring systems to individual road segments
Transportation Research Part C: Emerging Technologies
(2008) - et al.
Arterial travel time forecast with streaming data: a hybrid approach of flow modeling and machine learning
Transportation Research Part B: Methodological
(2012) - et al.
Travel time estimation for urban road networks using low frequency probe vehicle data
Transportation Research Part B: Methodological
(2013) - et al.
Mining smart card data for transit riders’ travel patterns
Transportation Research Part C: Emerging Technologies
(2013) - et al.
Benefit distribution and equity in road network design
Transportation Research Part B: Methodological
(2002) - et al.
Distribution-free travel time reliability assessment with probability inequalities
Transportation Research Part B: Methodological
(2011) - et al.
Estimation of origin–destination matrices from link counts and sporadic routing data
Transportation Research Part B: Methodological
(2012) - et al.
Bayesian inference for day-to-day dynamic traffic models
Transportation Research Part B: Methodological
(2013) - et al.
Designing heterogeneous sensor networks for estimating and predicting path travel time dynamics: an information-theoretic modeling approach
Transportation Research Part B: Methodological
(2013)
Urban link travel time estimation based on sparse probe vehicle data
Transportation Research Part C: Emerging Technologies
Transit buses as traffic probes: empirical evaluation using geo-location data
Transportation Research Record: Journal of the Transportation Research Board
Mathematical Statistics: Basic Ideas and Selected Topics
Pattern Recognition and Machine Learning
Real-time estimation of arterial travel times with spatial travel time covariance relationships
Transportation Research Record: Journal of the Transportation Research Board
Automatic license plate recognition
IEEE Transactions on Intelligent Transportation Systems
Maximum likelihood from incomplete data via the EM algorithm
Journal of the Royal Statistical Society. Series B (Methodological)
Cited by (11)
Real-time estimation of multi-class path travel times using multi-source traffic data
2024, Expert Systems with ApplicationsPrediction of travel time on urban road links with and without point detectors
2022, Asian Transport StudiesCitation Excerpt :With the predicted travel time variations of the path and observed links, the travel time variations of unobserved links are predicted through the formulation of a maximum likelihood estimation (MLE) problem. With the concept of TTD (Wang, 2015; Yin et al., 2015), the travel time variations of unobserved links are predicted under conditional information obtained from stage 1 of the proposed prediction framework. This TTD process is included in stage 2 of the proposed prediction framework.
Machine learning-driven algorithms for the container relocation problem
2020, Transportation Research Part B: MethodologicalCitation Excerpt :As machine learning techniques are incorporated in the design of our algorithm, we also give a brief review on this field. Machine learning is usually used in learning (Siripirote et al., 2015; Wei and Liu, 2013; Allahviranloo and Recker, 2013), estimation (Yin et al., 2015; Zheng and Zuylen, 2013) and prediction (Andres and Nair, 2017) of parameters, as well as to enhance the modeling process (see, e.g., Hofleitner et al. (2012), Sopasakis and Katsoulakis (2016), Kouvelas et al. (2017), Arentze and Timmermans (2004), and Dong et al. (2018)). The study which adopts machine learning techniques for the container location-related problems is very scant in the literature.
A tensor-based Bayesian probabilistic model for citywide personalized travel time estimation
2018, Transportation Research Part C: Emerging TechnologiesCitation Excerpt :Some conclusions and future researches are summarized in Section 6. A number of approaches have been developed for urban travel time estimation in recent years (Chen et al., 2017; Rahmani et al., 2017; Yin et al., 2015). For the research object examined in this paper, existing literatures are discussed from the following aspects: used data, adopted model and concerned scale (or studied area).
Estimation of delay variability at signalized intersections for urban arterial performance evaluation
2017, Journal of Intelligent Transportation Systems: Technology, Planning, and OperationsOptimizing scheduling of long-term highway work zone projects
2016, International Journal of Transportation Science and TechnologyCitation Excerpt :The Sioux Falls network shown in Fig. 4 was selected as a numerical example to illustrate the effectiveness of the proposed methodology. This example network contains 24 travel demand zones, 76 links, and 576 O–D pairs and has been used in many publications (Yin et al., 2015), as it is good for code debugging. Besides, Bar-Gera (2015) found the Sioux Falls user equilibrium solution using the quadratic BPR cost functions, which could be used for cross-checking results.