Analysing the impact of travel information for minimising the regret of route choice

https://doi.org/10.1016/j.trc.2017.11.011Get rights and content

Highlights

  • Mobile navigation app that provides drivers with travel information on their routes.

  • Action regret definition as a linear combination of local and global information.

  • Method for agents estimating action regret using their experience and app information.

  • Reinforcement learning algorithm using action regret as reinforcement signal.

  • Experiments showing regret is minimised and system converges to the User Equilibrium.

Abstract

In the route choice problem, self-interested drivers aim at choosing routes that minimise travel costs between their origins and destinations. We model this problem as a multiagent reinforcement learning scenario. Here, since agents must adapt to each others’ decisions, the minimisation goal is seen as a moving target. Regret is a well-known performance measure in such settings, and considers how much worse an agent performs compared to the best fixed action in hindsight. In general, regret cannot be computed (and used) by agents because its calculation requires observing the costs of all available routes (including non-taken ones). In contrast to previous works, here we show how agents can compute regret by building upon their experience and via information provided by a mobile (Waze-like) navigation app. Specifically, we compute the regret of each action as a linear combination of local (experience-based) and global (app-based) information. We refer to such a measure as the action regret, which can be used by the agents as reinforcement signal. Under these conditions, agents are able to minimise their external regret even when the cost of routes is not known in advance. Based on experimental evaluation in several abstract road networks, we show that the system converges to approximate User Equilibria.

Introduction

The route choice problem concerns how rational drivers1 behave when choosing routes between their origins and destinations to minimise their travel costs. In order to accomplish this goal, drivers must adapt their choices to account for changing traffic conditions. Such scenarios are naturally modelled as multiagent systems. Multiagent reinforcement learning (RL) captures the idea of self-interested agents interacting in a shared environment to improve their outcomes. In the basic, single-agent RL setting, the agent must learn by trial-and-error how to behave in an environment in order to maximise its utility. However, in multiagent RL settings, multiple agents share a common environment, and thus must adapt their behaviour to each other. In other words, the learning objective of each agent becomes a moving target.

An interesting class of multiagent RL techniques comprises the regret minimisation approaches. Different notions of regret are considered in the literature (Blum and Mansour, 2007). The most common one is that of external regret, which measures how much worse an agent performs on average in comparison to the best fixed action in hindsight. In this sense, regret minimisation can be seen as an inherent definition on how rational agents behave over time.

The use of regret in the context of route choice and correlated problems has been widely explored in the literature (Cesa-Bianchi and Lugosi, 2006). In the transportation literature, regret has been employed to develop discrete choice models that predict travellers’ behaviour (Chorus et al., 2008, Chorus, 2012). However, as opposed to our approach, such models focus on the traffic manager (centralised) perspective, assuming full knowledge and ignoring agents’ adaptation. Within RL, regret has been mainly employed as a performance measure (Bowling, 2005, Banerjee and Peng, 2005, Prabuchandran et al., 2016). In contrast to such approaches, here we use regret to guide the learning process. A few existing techniques indeed use regret as reinforcement signal (Zinkevich et al., 2008, Waugh et al., 2015), but they assume regret is known a priori by the agents. However, we highlight that calculating regret requires complete knowledge about the environment (i.e., the reward associated with every possible action along time). On the one hand, such knowledge may be obtained using on-line services (Vasserman et al., 2015, Hasan et al., 2016), which provide travel information to end-users through mobile navigation apps. On the other hand, computing regret in the absence of global information is more challenging (Stone and Veloso, 2000), especially in highly competitive scenarios like traffic. We combine these two fronts and investigate how agents can estimate their regret using both local (experience-based) and global (app-based) information. Previously, we investigated a similar direction (Ramos and Bazzan, 2016) and provided formal performance guarantees (Ramos et al., 2017). However, agents’ regret was computed using only local information.

In this paper, we develop a regret minimisation algorithm for handling the route choice problem. We propose a method for agents to estimate their regret using both local information (an internal history of observed rewards) and global information (travel times provided by a mobile navigation app). Apart from agent regret, we also consider the action regret, which measures the average amount lost by an agent up to a given time for taking a specific action rather than the best one. The action regret can thus be used for updating agents’ policies. The expected system’s outcome corresponds to the so-called User Equilibrium (UE) (Wardrop, 1952), i.e., an equilibrium point in the space of policies in which no driver benefits from deviating from its policy. To the best of our knowledge, this is the first approach in which agents compute their regret by combining local and global information and without assuming that the cost of all possible actions, in all situations, is known in advance.

The main hypotheses of our work are that: (i) learning to minimise regret improves drivers’ performance, (ii) using app-based information reduces regret, and (iii) the system converges to an approximate UE. In the present setting, learning means finding the best route to take. Recall that this objective is a moving target because there are many agents interacting within the same environment. In this respect, convergence to a solution means a point at which agents keep exploiting their knowledge most of the time and the system is somewhat stable (i.e., agents only observe small fluctuations in their costs). Our key contribution is to show that when the agents are using our approach such a stable point is close to the UE. In particular, the contributions of this work are:

  • We define a (mobile) navigation entity (app, henceforth) that provides travel information2 to the agents. Information here is simply the average travel times of the routes used by the agents. Such information is useful for the agents to estimate their regret.

  • We propose an action-based measure of regret, the action regret, which can be used as reinforcement signal in the RL process.

  • We introduce a method for agents to estimate their action regret using a linear combination of their experience (rewards received in previous episodes) and information provided by the app. We show that such estimates can be used to improve the learning process.

  • We develop an RL algorithmic solution that updates an agent’s policy using action regret as reinforcement signal. Consequently, the agents learn to choose actions that minimise their external regret.

This paper is organised as follows. Section 2 provides a background on the topics related to this work. Section 3 presents the proposed methods. Our approach is experimentally evaluated in Section 4. Concluding remarks are presented in Section 5.

Section snippets

Background

In this section, we review the literature on route choice (Section 2.1), reinforcement learning (Section 2.2), regret minimisation (Section 2.3) and routing with non-local information (Section 2.4).

Learning with action regret

In this section, we discuss how agents can estimate the regret of their actions (Section 3.2) and present an algorithmic solution for them to learn using such estimates (Section 3.3). Moreover, we show how these estimations can be improved using recommendations from a navigation app (Section 3.1).

In our approach, every driver is represented by a Q-learning

Experimental evaluation

In this section, we empirically analyse the performance of our method. The main hypotheses of our work are that: (i) learning to minimise regret improves the drivers’ performance, (ii) using app-based information reduces regret, and (iii) the system converges to an approximate UE.

Before going into the experiments and results, we need to clarify the meaning of learning and convergence. Recall that learning means finding the best route to take, which becomes a moving target when the environment

Conclusion

Reinforcement Learning (RL) is a challenging problem, especially in highly competitive scenarios. This paper addressed the route choice problem, in which each driver must choose a route that minimises its travel time. We refer to these settings as multiagent RL because agents must adapt to each others’ decisions, thus making the objective a moving target.

In this paper, we proposed a regret-minimising method to address the route choice problem. Here, the agents learn their route choice policies

Acknowledgments

We are very grateful to the anonymous reviewers for their thorough analysis and valuable suggestions. We also thank Prof. Yafeng Yin, who has managed the reviewing process of this particular contribution on behalf of the editors of the special issue. The authors are partially supported by CNPq and CAPES grants.

References (71)

  • L.J. LeBlanc et al.

    An efficient approach to solving the road network equilibrium traffic assignment problem

    Transport. Res.

    (1975)
  • M. Li et al.

    A regret theory-based route choice model

    Transport. A: Transport Sci.

    (2017)
  • T.H. Rashidi et al.

    Transport. Res. Part C: Emerg. Technol.

    (2017)
  • T. Roughgarden

    On the severity of Braess’s paradox: designing networks for selfish users is hard

    J. Comput. Syst. Sci.

    (2006)
  • Y. Shoham et al.

    If multi-agent learning is the answer, what is the question?

    Artif. Intell.

    (2007)
  • G. Wang et al.

    A combined framework for modeling the evolution of traveler route choice under risk

    Transport. Res. Part C: Emerg. Technol.

    (2013)
  • J.D. Abernethy et al.

    Interior-point methods for full-information and bandit online learning

    IEEE Trans. Inform. Theory

    (2012)
  • Agarwal, A., Dekel, O., Xiao, L., 2010. Optimal algorithms for online convex optimization with multi-point bandit...
  • Arora, R., Dekel, O., Tewari, A., 2012. Online bandit learning against an adaptive adversary: from regret to policy...
  • P. Auer et al.

    The nonstochastic multiarmed bandit problem

    SIAM J. Comput.

    (2002)
  • B. Awerbuch et al.

    Adaptive routing with end-to-end feedback: distributed learning and geometric approaches

  • B. Banerjee et al.

    Efficient no-regret multiagent learning

  • Bar-Gera, H., 2016. Transportation Networks for Research....
  • A.L.C. Bazzan et al.

    Case studies on the Braess paradox: simulating route recommendation and learning in abstract and microscopic models

    Transport. Res. C

    (2005)
  • A.L.C. Bazzan et al.

    Introduction to intelligent systems in traffic and transportation

    (2013)
  • D.E. Bell

    Regret in decision making under uncertainty

    Oper. Res.

    (1982)
  • E. Ben-Elia et al.

    “if only i had taken the other road…”: regret, risk and reinforced learning in informed route-choice

    Transportation

    (2013)
  • A. Blum et al.

    Routing without regret: on convergence to nash equilibria of regret-minimizing algorithms in routing games

    Theory Comput.

    (2010)
  • A. Blum et al.

    Learning, regret minimization, and equilibria

  • M. Bowling

    Convergence and no-regret in multiagent learning

  • D. Braess

    Über ein Paradoxon aus der Verkehrsplanung

    Unternehmensforschung

    (1968)
  • Bureau of Public Roads, 1964. Traffic Assignment Manual. US Department of...
  • L. Buşoniu et al.

    A comprehensive survey of multiagent reinforcement learning

    IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev.

    (2008)
  • N. Cesa-Bianchi et al.

    Prediction, Learning, and Games

    (2006)
  • H. Chan et al.

    Congestion games with polytopal strategy spaces

  • Cited by (35)

    • RIde-hail vehicle routing (RIVER) as a congestion game

      2023, Transportation Research Part B: Methodological
    • Reinforcement learning for ridesharing: An extended survey

      2022, Transportation Research Part C: Emerging Technologies
    • Macroscopic modelling and analysis of flows during rush-hour congestion

      2021, Performance Evaluation
      Citation Excerpt :

      If there are sufficient users of the application in an area, the aggregation of the collected data yields an accurate assessment of the traffic situation in that area. For each user, optimal paths are then calculated and the corresponding travel advice is returned to the users [3,4]. A third mechanism is to incentivise public transport.

    View all citing articles on Scopus

    This article belongs to the Virtual Special Issue on “Agents in Traffic and Transportation”.

    View full text