Elsevier

Automatica

Volume 43, Issue 5, May 2007, Pages 817-830
Automatica

Simulation-based optimal sensor scheduling with application to observer trajectory planning

https://doi.org/10.1016/j.automatica.2006.11.019Get rights and content

Abstract

The sensor scheduling problem can be formulated as a controlled hidden Markov model and this paper solves the problem when the state, observation and action spaces are continuous. This general case is important as it is the natural framework for many applications. The aim is to minimise the variance of the estimation error of the hidden state w.r.t. the action sequence. We present a novel simulation-based method that uses a stochastic gradient algorithm to find optimal actions.

Introduction

Consider the following continuous state hidden Markov model (HMM):Xn+1=f(Xn,An+1,Wn),Yn=g(Xn,An,Vn),where XnRdx is the hidden system state, YnRdy the observation of the state and Wn and Vn are i.i.d. noise terms. Unlike the classical HMM model, the evolution of the state and observation processes depends on an input parameter AnRda, which is the control or action. In HMM models, one is primarily concerned with the problem of estimating the hidden state, which is achieved by propagating the posterior distribution (or filtering density) πn(x)dx=P(Xndx|A1:n,Y1:n). By a judicious choice of action sequence {An}, the evolution of the state and observation processes can be ‘steered’ in order to yield filtering densities that give more accurate estimates of the state process. This problem is also known in the literature as the sensor scheduling problem.

Sensor scheduling has been a topic of interest to the target tracking community for the some years now (Hernandez et al., 2004, Kershaw and Evans, 1994, Logothetis et al., 1997, Meier et al., 1967, Singh et al., 2005, Tremois and Le Cadre, 1999). The classical setting is the problem of tracking a manoeuvring target over N epochs. Here Xn denotes the state of the target at epoch n, Yn the observation provided by the sensor and An some parameter of the sensor that may be adjusted to improve the ‘quality’ of the observation. For example, consider a non-moving platform with a finite number of sensors, where each has different characteristics. In this case An denotes the choice of sensor to be used at epoch n (Lee et al., 2001, Singh et al., 2004). Alternatively, there may be only one sensor and An could denote some tunable parameter of the sensor, as in the waveform selection problem (Kershaw & Evans, 1994), or in the case of directing an electronically scanned aperture (ESA) radar (Blackman & Popoli, 1999). In contrast, consider the scenario in which a moving platform (or observer) is to be adaptively manoeuvred to optimise the tracking performance of a manoeuvring target. In this setting, An denotes the position of the observer at epoch n and the problem is termed the optimal observer trajectory planning (OTP) problem (Hernandez et al., 2004, Logothetis et al., 1997, Tremois and Le Cadre, 1999). In all these sensor scheduling problems, a measure of tracking performance is the mean squared tracking error over the N epochs,En=1N(ψ(Xn)-πn,ψ)2,where ψ:RdxR is a suitable test function that emphasises the component (or components) of interest of the state vector we wish to track. The aim is to minimise (2) w.r.t. the choice of actions {A1,,AN}.

When the dynamics of the state and the observation processes are both linear and Gaussian, then the optimal solution to the sensor scheduling problem (2) (when ψ gives a quadratic cost) can be computed off line (Meier et al., 1967); this is not surprising given that the Kalman filter covariance is also independent of the actual realisation of observations. In the general setting studied in this paper, the dynamics can be both non-linear and non-Gaussian, which means that the filtering density πn, and integration w.r.t. it, cannot be evaluated in closed form. Hence, the tracking error performance criterion itself does not admit a closed-form expression. To further complicate matters, the actions sought are continuous valued, i.e., vectors in Rda.

To address the complications to do with the non-linear and non-Gaussian dynamics, one could linearise the state and observation model (as in Logothetis et al. (1997)), i.e., using the extended Kalman filter to propagate the filtering density πn. However, dealing with the tracking error performance criterion directly is the exception rather than the rule. The majority of works (Hernandez et al., 2004, Paris and Le Cadre, 2002, Tremois and Le Cadre, 1999, and references therein), while aiming to minimise mean squared tracking error, do so indirectly by defining a lower bound to the tracking error criterion and minimising the lower bound instead. The bound in question is the posterior Cramer–Rao lower bound (PCRLB), which is the inverse of the Fisher information matrix (FIM). This approach hinges on the ability to propagate recursively the FIM in closed form by a Ricatti-type equation for the non-linear and non-Gaussian filtering problem. Unfortunately, the recursion for the FIM involves evaluating the expectation of certain derivatives of the transition probability density of the state dynamics, as well as the expectation of certain derivatives of the observation likelihood (see (3) and (4) below). As these quantities cannot be evaluated in general except for the linear and Gaussian case, this assumption is either invoked or the authors resort to simulation-based approximations.

As for the complications due to continuous valued actions, the approach in the literature is to discretise Rda to a grid. There have also been studies where the continuous state HMM (1) is approximated by a discrete state HMM, and the latter solved using dynamic programming (Tremois & Le Cadre, 1999).

The aim of this paper is to solve the sensor scheduling problem with continuous action space directly, and not a surrogate problem defined through the PCRLB or otherwise. We make no assumptions of linearity or Gaussianity for analytic convenience, nor do we discretise the state, observation or action space. We avoid these restrictive modelling assumptions on the continuous state HMM by recourse to methods based on computer simulation (simulation for short). As the action policy derived will be a function of the filtering density, we will employ simulation to approximate the posterior density by a finite sum of weighted point–mass distributions (Doucet, De Freitas, & Gordon, 2001). The main advantage of simulation over other numerical integration methods is that it is typically very easy to implement. Furthermore, it follows the strong law of large numbers (Del Moral, 2004) and there is much literature on its efficient implementation for approximating posterior densities (Doucet et al., 2001).

In order to solve for the optimal sequence of continuous valued actions, we will use an iterative stochastic gradient algorithm. We derive the gradient of the performance criterion w.r.t. the action trajectory and demonstrate how low-variance estimates of it may be obtained using control variate (CV) (Glynn & Szechtman, 2002) techniques. One major advantage of a stochastic gradient-based method is that theoretical guarantees are easily obtained. Under suitable regularity assumptions, one can guarantee convergence to a local optimum of the performance criterion, while it is difficult to make similar assertions about the quality of the solutions obtained by other approximate methods proposed in the literature for sensor scheduling.

As an instance of the sensor scheduling problem, we study the OTD problem for a bearings-only application. We state theoretical results concerning the convergence of the observer trajectory identified by our simulation-based algorithm. Handling multiple observers simultaneously is easy in our proposed framework, and numerical results are presented for cooperating observers tracking a manoeuvring target.

The organisation of this paper is as follows. In Section 2, we formulate the optimal sensor scheduling problem. We also summarise some key points concerning several methods in the literature that may be used to solve this problem. In Section 3, we derive the gradient of the performance criterion being optimised, and detail the use of simulation and variance reduction techniques for its estimation. In Section 4.2, we present the main algorithm of the paper, which is a two time-scale stochastic gradient algorithm for solving the sensor scheduling problem. General convergence results for this algorithm are presented in the Appendix. In Section 5, we formulate the OTD problem as an instance of the sensor scheduling problem and apply the convergence results to this application. Numerical examples are presented in Section 6, and concluding remarks are presented in Section 7. All proofs appear in the technical report (Singh et al., 2005), which is available online or may be obtained by e-mailing any of the authors.

Notation

The notation that is used in the paper is now outlined. The norm of a scalar, vector or matrix is denoted by |·|. For a vector b, |b| denotes the vector 2-norm i|b(i)|2. For a matrix A, |A| denotes the matrix 2-norm, maxb:|b|0|Ab|/|b|. For convenience, we also denote a vector bRn by b=[b(i)]i=1,,n, or the ith component of a vector by [b]i. For scalars aj,i, j=1,,m, i=1,,n, let [[aj,i]j=1,,m]i=1,,n denote the stacked vector [a1,1,,am,1,,a1,n,,am,n]T. For a vector b, let diag(b) denote the diagonal matrix formed from b. For a function f:RnR with arguments zRn, we denote (f/z(i))(z) by z(i)f(z) and f(z)=[z(1)f(z),,z(n)f(z)]T. For the vector-valued function F=[F1,,Fn]T:RnRn, let F denote the matrix [F1,,Fn]. For real-valued integrable functions f and g, let f,g denote f(x)g(x)dx.

Section snippets

Problem formulation

At time n, let Xn and Yn be random vectors that model the dx-dimensional state and its dy-dimensional observation, respectively. Suppose that an action AnRda is applied at time n. The state {Xn}n0 is an unobserved Markov process with initial distribution and transition law given byX0π0,Xn+1p(·|Xn,An+1).(The symbol ‘’ means distributed according to.) The observation process {Yn}n1 is generated according to the state- and action-dependent probability densityYnq(·|Xn,An).Given the sequence

The cost gradient and its simulation-based approximation

In this section, we derive the gradient of the cost function (6) w.r.t. A1:N. We then propose a suitable simulation-based approximation for optimising with SA. Because problem (6) is solved for a fixed initial state distribution π0, henceforth, we omit reference to π0 in the notation for E(π0,A1:N) and denote the probability w.r.t. which this expectation is taken by PA1:N.

Keeping in mind that (ψ(Xn)-πn,ψ)2 is a function of the form h(X1:n,A1:n,Y1:n), (7) implies EA1:N{(ψ(Xn)-πn,ψ)2}=EA1:n{(ψ(X

A verifiably convergent particle implementation

Implementing the algorithm detailed in Section 3 with the gradient estimate (16) is straightforward. However, to prove its convergence we would not be able to use standard SA results. Even though (16) is a noisy estimate of AlJ(A1:N), the noise is not zero-mean due to the bias of the simulation-based approximations to πn and π0:n. To assert convergence of (10) to a minima of J, we would have to gradually increase the number of samples L to remove the bias. (Similar conditions are required for

Application to Observer Trajectory Planning (OTP)

In OTP, we wish to track a manoeuvring target for N epochs. At epoch n, Xn denotes the state of the target, An the position of the observer and Yn the partial observation of the target state, i.e., Yn=g(Xn,An,Vn), where Vn is measurement noise. Typically, the observer has its own motion model and we let Xno denote state of the observer. The observer state descriptor usually includes its position and therefore An corresponds to certain components of Xno. The aim of OTP is to adaptively manoeuvre

Numerical example

The aim of this section is to demonstrate the utility of the proposed simulation-based algorithm for the OTP problem. We demonstrate various convergence aspects of the algorithm and solve for the optimal open-loop observer trajectory under a variety of tracking scenarios, namely, with a single observer and cooperating observers. OLFC is implemented for the cooperating observers.

All examples below for OLFC concern a manoeuvring target where the target follows a linear Gaussian model between

Conclusion

In this paper we proposed a novel simulation-based method to solve the sensor scheduling problem for the case in which the state, observation and action spaces are continuous valued vectors. This general continuous state-space case is important as it is the natural framework for many applications, like OTP. We avoided restrictive modelling assumptions on the continuous state HMM, such as a linear and (or) Gaussian system, by recourse to simulation-based methods. This paper solved the sensor

Sumeetpal S. Singh received the B.E. (with first-class honours) and Ph.D. degrees from the University of Melbourne, Melbourne, Australia, in 1997 and 2002, respectively. From 1998 to 1999, he was a Communications Engineer in industry. From 2002 to 2004, he was a Research Fellow in the Department of Electrical and Electronic Engineering, University of Melbourne. He joined the Engineering Department of Cambridge University in 2004 as a Research Associate and is currently a University Lecturer in

References (21)

  • V.R. Konda et al.

    Linear stochastic approximation driven by slowly varying Markov chains

    Systems and Control Letters

    (2003)
  • H.W.J. Lee et al.

    Sensor scheduling in continuous time

    Automatica

    (2001)
  • D.P. Bertsekas

    Dynamic programming and optimal control

    (1995)
  • D.P. Bertsekas et al.

    Gradient convergence in gradient methods with errors

    SIAM Journal on Optimization

    (2000)
  • S. Blackman et al.

    Modern tracking systems

    (1999)
  • P. Del Moral

    Feynman–Kac formulae: Genealogical and interacting particle systems with applications

    (2004)
  • A. Doucet et al.

    Sequential Monte Carlo methods in practice

    (2001)
  • A. Doucet et al.

    Particle filters for state estimation of jump Markov linear systems

    IEEE Transactions on Signal Processing

    (2001)
  • P.W. Glynn et al.

    Some new perspectives on the method of control variates

  • M. Hauskrecht

    Value-function approximations for partially observable Markov decision processes

    Journal of Artificial Intelligence Research

    (2000)
There are more references available in the full text version of this article.

Cited by (0)

Sumeetpal S. Singh received the B.E. (with first-class honours) and Ph.D. degrees from the University of Melbourne, Melbourne, Australia, in 1997 and 2002, respectively. From 1998 to 1999, he was a Communications Engineer in industry. From 2002 to 2004, he was a Research Fellow in the Department of Electrical and Electronic Engineering, University of Melbourne. He joined the Engineering Department of Cambridge University in 2004 as a Research Associate and is currently a University Lecturer in Engineering statistics. His research interests include Monte Carlo methods for estimation and control.

Nikolaos Kantas was born in Athens, Greece, in 1981. He completed his undergraduate education in Cambridge University Engineering Department, where he is currently working towards a Ph.D. degree on Monte Carlo methods for estimation and control.

Ba-Ngu Vo was born in 1970 in Saigon Vietnam. He received his Bachelor degrees jointly in Science and Electrical Engineering with first-class honours at the University of Western Australia in 1994, and a Ph.D. at Curtin University of Technology in 1997. Since 2000, he has been with the Department of Electrical and Electronic Engineering at the University of Melbourne where he is currently an Associate Professor. Dr. Vo's research interests include optimisation, signal processing and stochastic geometry.

Arnaud Doucet was born in France in 1970. He received the Ph.D. in Information Engineering from University Paris XI (Orsay) in 1997. From 1998 to 2000, he was a Research Associate in Cambridge University Engineering Department. From 2001 to 2002, he was Senior Lecturer in the Department of Electrical Engineering of Melbourne University. From 2002 to 2005, he was University Lecturer in Cambridge University Engineering Department. Since 2005, he is Associate Professor and Canada Research Chair in the Department of Statistics and the Department of Computer Science of the University of British Columbia. His main research interests are Bayesian statistics and Monte Carlo methods.

Robin Evans was born in Melbourne, Australia, in 1947. After completing a B.E. degree in Electrical Engineering at the University of Melbourne in 1969, he worked as a Radar Systems Engineering Officer with the Royal Australian Airforce. He completed a Ph.D. in 1975 at the University of Newcastle followed by postdoctoral studies at the Laboratory for Information and Decision Systems, MIT, USA, and the Control and Management Department, Cambridge University, UK. He has held various academic positions including Head of the Department of Electrical and Computer Engineering at the University of Newcastle, Head of the Department of Electrical and Electronic Engineering at the University of Melbourne, Research Leader for the Cooperative Centre for Sensor Signal and Information Processing and Director of the Centre for Networked Decision Systems. He is currently Director of the Victoria Research Laboratory of National ICT Australia. His research has ranged across many areas including theory and applications in industrial control, radar systems, signal processing and telecommunications. He is a Fellow of the Australian Academy of Science, a Fellow of the Australian Academy of Technological Sciences and Engineering and a Fellow of the Institution of Electrical and Electronic Engineers, USA.

This paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor Derong Liu under the direction of Editor Miroslav Krstic.

View full text