ABSTRACT
Minimal delay routing is a fundamental task in networks. Since delays depend on the (potentially unpredictable) traffic distribution, online delay optimization can be quite challenging. While uncertainty about the current network delays may make the current routing choices sub-optimal, the algorithm can nevertheless try to learn the traffic patterns and keep adapting its choice of routing paths so as to perform nearly as well as the best static path. This online shortest path problem is a special case of online linear optimization, a problem in which an online algorithm must choose, in each round, a strategy from some compact set S ⊆ Rd so as to try to minimize a linear cost function which is only revealed at the end of the round. Kalai and Vempala[4] gave an algorithm for such problems in the transparent feedback model, where the entire cost function is revealed at the end of the round. Here we present an algorithm for online linear optimization in the more challenging opaque feedback model, in which only the cost of the chosen strategy is revealed at the end of the round. In the special case of shortest paths, opaque feedback corresponds to the notion that in each round the algorithm learns only the end-to-end cost of the chosen path, not the cost of every edge in the network.We also present a second algorithm for online shortest paths, which solves the shortest-path problem using a chain of online decision oracles, one at each node of the graph. This has several advantages over the online linear optimization approach. First, it is effective against an adaptive adversary, whereas our linear optimization algorithm assumes an oblivious adversary. Second, even in the case of an oblivious adversary, the second algorithm performs better than the first, as measured by their additive regret.
- Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. Gambling in a rigged casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pages 322--331. IEEE Computer Society Press, Los Alamitos, CA, 1995. Google ScholarDigital Library
- Baruch Awerbuch and Yishay Mansour. Online learning of reliable network paths. In PODC, 2003. to appear.Google Scholar
- Avrim Blum, Geoff Gordon, and Brendan McMahan. Bandit version of the shortest paths problem. Unpublished manuscript, July 2003.Google Scholar
- Adam Kalai and Santosh Vempala. Geometric algorithms for online optimization, 2003. unpublished manuscript.Google Scholar
- N. Littlestone and M. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212--260, 1994. Google ScholarDigital Library
- Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. In IEEE Symposium on Foundations of Computer Science, pages 256--261, 1989.Google ScholarDigital Library
- Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Information and Computation, 108:212--261, 1994. A preliminary version appeared in FOCS 1989. Google ScholarDigital Library
- Eiji Takimoto and Manfred K. Warmuth. Path kernels and multiplicative updates. In COLT Proceedings, 2002. Google ScholarDigital Library
Index Terms
- Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
Recommendations
Multi-armed bandits in metric spaces
STOC '08: Proceedings of the fortieth annual ACM symposium on Theory of computingIn a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of $n$ trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is ...
Playing games with approximation algorithms
STOC '07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computingIn an online linear optimization problem, on each period t, an online algorithm chooses st ∈ S from a fixed (possibly infinite) set S of feasible decisions. Nature (who may be adversarial) chooses a weight vector wt ∈ R, and the algorithm incurs cost c(...
Playing Games with Approximation Algorithms
In an online linear optimization problem, on each period $t$, an online algorithm chooses $s_t\in\mathcal{S}$ from a fixed (possibly infinite) set $\mathcal{S}$ of feasible decisions. Nature (who may be adversarial) chooses a weight vector $w_t\in\...
Comments