An Iterative Nonlinear Filter Using Variational Bayesian Optimization

Hu, Yumei; Wang, Xuezhi; Lan, Hua; Wang, Zengfu; Moran, Bill; Pan, Quan

doi:10.3390/s18124222

Open AccessArticle

An Iterative Nonlinear Filter Using Variational Bayesian Optimization

¹

School of Automation, Northwestern Polytechnical University, Xi’an 710072, China

²

Key Laboratory of Information Fusion Technology, Ministry of Education, Xi’an 710072, China

³

School of Engineering, RMIT University, Melbourne 3000, Australia

⁴

Department of Electrical and Electronic Engineering, University of Melbourne, Melbourne, VIC 3010, Australia

^*

Author to whom correspondence should be addressed.

Sensors 2018, 18(12), 4222; https://doi.org/10.3390/s18124222

Submission received: 11 October 2018 / Revised: 23 November 2018 / Accepted: 28 November 2018 / Published: 1 December 2018

(This article belongs to the Section Physical Sensors)

Download

Browse Figures

Versions Notes

Abstract

:

We propose an iterative nonlinear estimator based on the technique of variational Bayesian optimization. The posterior distribution of the underlying system state is approximated by a solvable variational distribution approached iteratively using evidence lower bound optimization subject to a minimal weighted Kullback-Leibler divergence, where a penalty factor is considered to adjust the step size of the iteration. Based on linearization, the iterative nonlinear filter is derived in a closed-form. The performance of the proposed algorithm is compared with several nonlinear filters in the literature using simulated target tracking examples.

Keywords:

target tracking; nonlinear filtering; variational Bayes; Kullback-Leibler divergence

1. Introduction

Bayesian estimation is widely applied across many areas of engineering such as target tracking, aerial surveillance, intelligent vehicles, and machine learning [1]. In linear Gaussian systems, the state estimation can be optimally achieved using a Kalman filter as a closed form solution. However, many real-world estimation problems are nonlinear, resulting in analytically intractable posterior probability density function (PDF) for the state. In consequence, suboptimal approximation methods are sought to solve the nonlinear estimation problems [2].

Many suboptimal techniques have been developed to solve nonlinear estimation problems. These techniques may be divided into the following three categories. The first category, includes the extended Kalman filter (EKF) [3], the iterated extended Kalman filter (IEKF) [4,5], and their variants [6,7], solves the state estimation problem through replacing the nonlinear functions by their linear approximations via the Taylor expansion. The second category involves stochastic sampling methods. In the filtering process, a set of randomly sampled points with weights are adopted to approximate the PDF of the underlying state. For example, the particle filter (PF) [8,9,10] is a sequential Monte Carlo (SMC) stochastic sampling method, which approximates the PDF by the sampled particles from a proposal distribution. PF can be applied to nonlinear non-Gaussian systems. Markov Chain Monte Carlo (MCMC) is another popular stochastic method since it is able to achieve arbitrarily high accuracy using a large number of particles, sometimes resulting in prohibitive computational expenditure. The techniques in the third category use deterministic sampling methods. Nonlinear state PDFs are approximated by a set of fixed points and weights that represent the location and spread of the distribution. This category includes the unscented Kalman filter (UKF) [9,11,12], the cubature Kalman filter (CKF) [13,14], and the central difference Kalman filter (CDKF) [15]. The UKF and the CDKF approximate the nonlinear state transition function and the measurement function by the unscented transform (UT) and the Stirling interpolation, respectively. The state and the corresponding error covariance are then calculated based on the sampling points and the weights. The CKF uses the third order spherical-radial cubature rule to approximate integrals numerically.

Where nonlinear filters involve optimisation, quasi-Newton approximations are used, along with Kullback-Leibler (KL) divergence and

α

-divergence as objective functions. For example, the EKF update step may use a Hessian correction term, resulting in improved performance [7,16]. Another approach for estimation, in some ways can be regarded as the fourth category, uses the variational Bayes (VB) technique. VB has a unified and principled architecture and can be used, as with other methods mentioned earlier, for problems where analytic solutions cannot be found. The objective of the VB approximation is to find the variational distribution best able, in the sense of the KL divergence, to approximate the true PDF [17,18,19]. Various variational inference methods for parametric distributions are discussed in [20], where the true PDF of hidden variables is assumed to be given by a parametric model. Since VB usually runs faster than MCMC [20,21,22], it is widely applied in areas such as statistics, finite element analysis, machine learning, etc. An important concern with the VB inference is the accuracy of the approximation [23]. Paul [24] and Stephane et al. [25] introduced a proximal point algorithm in the framework of the expectation-maximization (EM) and studied its convergence. Khan et al. [18,26] adopted KL divergence as a measure of divergence to improve the accuracy of VB inference by considering the geometry of the true posterior PDF. They applied their algorithm to parameter estimation, data set classification and regression.

In this paper, we adopt KL divergence as a metric of posterior PDF approximation, and derive an alternative nonlinear estimation algorithm based on the VB technique. The posterior PDF of the underlying system state is approximated by a parameterized variational distribution and the difference between the two distributions is minimized iteratively using a weighted KL divergence criterion. This is carried out through optimization of the evidence lower bound (ELBO); the intractable integration of the posterior PDF is converted to a solvable optimization problem. A penalty factor is applied to ensure that the filter algorithm obtains a good trade-off between accuracy and computational overhead. Numerical simulations of two typical target tracking scenarios and a benchmark nonlinear filtering problem are presented. The simulation results show that the approximation of nonlinear stochastic system state by the proposed algorithm is tighter than EKF and UKF. The computational cost and the effect of different penalty factors on estimation accuracy are analyzed.

The rest of the paper is organized as follows. In Section 2, we introduce the general nonlinear Bayesian filtering problem. The formulation of the ELBO is described in Section 3. In Section 4, we propose the proximal iterative nonlinear filter. In Section 5, we present simulations of two target tracking scenarios, where the performance of the proposed algorithm is compared with those of EKF and UKF in terms of estimation accuracy and computational overhead. Lastly, conclusions are drawn in Section 6.

2. Problem Formulation

Consider a general dynamic system with measurement as follows.

x_{k} = f_{k} (x_{k - 1}) + ω_{k},

(1)

z_{k} = h_{k} (x_{k}) + v_{k},

(2)

where

f_{k} (\cdot)

denotes the state transition function, and

h_{k} (\cdot)

denotes the mapping from the system state to the measurement;

ω_{k}

and

v_{k}

are the process noise and the measurement noise, respectively. We assume that

ω_{k}

and

v_{k}

are Gaussian and mutually independent,

ω_{k} \sim N (0, Q_{k})

and

v_{k} \sim N (0, R_{k})

.

Assuming that the posterior PDF

p (x_{k - 1} | z_{k - 1})

at time

k - 1

is available, the PDF of the predicted state is obtained by the Chapman-Kolmogorov equation.

p (x_{k} | z_{k - 1}) = \int p (x_{k} | x_{k - 1}) p (x_{k - 1} | z_{k - 1}) d x_{k - 1} .

(3)

Then, at time k, the posterior PDF is obtained using the measurement

z_{k}

by an application of the Bayes rule:

p (x_{k} | z_{k}) = \frac{p (z_{k} | x_{k}) p (x_{k} | z_{k - 1})}{\int p (z_{k} | x_{k}) p (x_{k} | z_{k - 1}) d x_{k}} .

(4)

For linear Gaussian systems, it is well known that the optimal state estimation

x_{k | k}

and the corresponding error covariance

P_{k | k}

under the criterion of minimum variance estimation are obtained by:

x_{k | k} = \int x_{k} p (x_{k} | z_{k}) d x_{k},

(5)

P_{k | k} = \int (x_{k} - x_{k | k}) {(x_{k} - x_{k | k})}^{T} p (x_{k} | z_{k}) d x_{k} .

(6)

For nonlinear systems, the integral in Equation (4) is often intractable. Suboptimal approximations for the posterior PDF are needed. Most existing suboptimal algorithms adopt linearization or sampling techniques to approximate the posterior PDF

p (x_{k} | z_{k})

. We consider an iterative VB approach, in which the true PDF is approximated by a variational distribution and is approached by iterative optimization of the ELBO. The proposed algorithm converts the nontrivial integration to a closed-form optimization and therefore improves estimation accuracy.

3. Evidence Lower Bound Maximization

The above nonlinear estimation problem can be solved using a VB framework. Express the marginal PDF

p (z_{k})

using a variational distribution

q (x_{k} | ψ_{k})

as follows:

\begin{matrix} log p (z_{k}) & = log \int q (x_{k} | ψ_{k}) \frac{p (x_{k}, z_{k})}{q (x_{k} | ψ_{k})} d x_{k} \\ = L (ψ_{k}) + D_{KL} [p (x_{k} | z_{k}) ∥ q (x_{k} | ψ_{k})], \end{matrix}

(7)

where

D_{KL} [p (x_{k} | z_{k}) ∥ q (x_{k} | ψ_{k})]

is the KL divergence between the true posterior PDF

p (x_{k} | z_{k})

and the variational distribution

q (x_{k} | ψ_{k})

; that is,

D_{KL} [p (x_{k} | z_{k}) ∥ q (x_{k} | ψ_{k})] = - \int q (x_{k} | ψ_{k}) log \frac{p (x_{k} | z_{k})}{q (x_{k} | ψ_{k})} d x_{k} .

(8)

L (ψ_{k})

is the variational ELBO:

\begin{matrix} L (ψ_{k}) & = \int q (x_{k} | ψ_{k}) log \frac{p (x_{k}, z_{k})}{q (x_{k} | ψ_{k})} d x_{k} \\ = E_{q} [log p (z_{k} | x_{k})] - D_{KL} [p (x_{k}) ∥ q (x_{k} | ψ_{k})] . \end{matrix}

(9)

The variational distribution

q (x_{k} | ψ_{k})

is assumed Gaussian with unknown parameter

ψ_{k} = (x_{k | k}, P_{k | k})

(to be estimated), where

x_{k | k}

is the mean and

P_{k | k}

is the covariance.

Please note that the poterior PDF

p (x_{k} | z_{k})

needs to be closely approximated by a known distribution in nonlinear filtering. From Equation (7), evidently, the variational distribution

q (x_{k} | ϕ_{k})

would be equal to the true posterior PDF

p (x_{k} | z_{k})

if the KL divergence were zero. Minimizing the KL divergence, and thereby approximating the posterior PDF by a variational distribution, is equivalent to maximizing the ELBO, i.e.,

\begin{matrix} ψ_{k}^{*} & = \underset{ψ_{k}}{arg max} L (ψ_{k}) . \end{matrix}

(10)

According to Equations (9) and (10), the nontrivial integration of Equation (7) is converted to the problem of maximizing ELBO, which can be solved by the VB method.

4. Proximal Iterative Nonlinear Filter

In this section, we derive an iterative procedure in a closed-form to iteratively maximize the ELBO so as to minimize the KL divergence between the true posterior PDF

p (x_{k} | z_{k})

and the variational distribution

q (x_{k} | ψ_{k})

.

4.1. Penalty Function Based on KL Divergence

Notice that the KL divergence

D_{KL} [q (x_{k} | ψ_{k}) ∥ q (x_{k} | ψ_{k}^{i})]

is nonnegative for all

q (x_{k} | ψ_{k})

. Following [24], we adopt the proximal point algorithm to generate a sequence

ψ_{k}^{i + 1}

via the following iterative scheme,

ψ_{k}^{i + 1} = \underset{ψ_{k}}{arg max} L (ψ_{k}) - \frac{1}{β_{i}} D_{KL} [q (x_{k} | ψ_{k}) ∥ q (x_{k} | ψ_{k}^{i})],

(11)

where i denotes the iteration index and the penalty factor

β_{i}

is used to adjust the optimization step length. Roughly speaking, the ELBO is maximized when the KL divergence between the two variational distributions

q (x_{k} | ψ_{k})

and

q (x_{k} | ψ_{k}^{i})

approaches zero. Equation (11) can be rewritten as

ψ_{k}^{i + 1} = \underset{ψ_{k}}{arg max} \{L (ψ_{k}) - \frac{1}{β_{i}} D_{KL} [q (x_{k} | ψ_{k}) ∥ q (x_{k} | ψ_{k}^{i})]\} .

(12)

Please note that one iteration of this proximal method is equivalent to moving a step in the direction of the natural gradient [18]. The influence of the KL divergence on

ψ_{k}^{i + 1}

can be adjusted by

β_{i}

. The larger the

β_{i}

, the weaker the influence of the KL divergence on

ψ_{k}^{i + 1}

, and vice versa. In [18], it is assumed that

β_{i} = 1

.

4.2. The Proximal Iterative Nonlinear Filter

The proximal iterative method is implemented via an iterative minimization of the KL divergence, where the initial state is assigned with an estimation from a core-filter, e.g., Bayesian filter. Here, EKF is adopted as the core-filter to predict and update the system state before the iterative optimization process. We propose a proximal iterative nonlinear filter combined with VB, called PEKF-VB, which is described and derived in the following.

Firstly, by substituting Equation (10) into Equation (12), we can rewrite the iterative optimization as

\begin{matrix} ψ_{k}^{i + 1} \underset{ψ_{k}}{= arg max} \{E_{q} [log p (z_{k} | x_{k})] - D_{KL} [p (x_{k}) ∥ q (x_{k} | ψ_{k})] - \frac{1}{β_{i}} D_{KL} [q (x_{k} | ψ_{k}) ∥ q (x_{k} | ψ_{k}^{i})]\} . \end{matrix}

(13)

Under the Gaussian assumptions for the process noise and the measurement noise, the variational distribution is of the form

q (x_{k} | ψ_{k}) \sim N (x_{k}; x_{k | k}, P_{k | k})

. Given the prior of the state

x_{k - 1 | k - 1}

at time

k - 1

, we can assume

p (x_{k}) \sim N (x_{k}; x_{k | k - 1}, P_{k | k - 1})

. Accordingly, the

ψ_{k}^{i + 1}

is

\begin{matrix} ψ_{k}^{i + 1} = & \underset{ψ_{k}}{arg max} \{E_{q} [log p (z_{k} | x_{k})] - D_{KL} [N (x_{k}; x_{k | k - 1}, P_{k | k - 1}) ∥ N (x_{k}; x_{k | k}, P_{k | k})] \\ - \frac{1}{β_{i}} D_{KL} [q (x_{k} | ψ_{k}) ∥ q (x_{k} | ψ_{k}^{i})]\} . \end{matrix}

(14)

For the first term in Equation (14), the expectation

E_{q} [log p (z_{k} | x_{k})]

related to

x_{k}

and

P_{k}

can be approximated linearly using the gradients with respect to (w. r. t.)

x_{k}

and

P_{k}

. Defining

g (x_{k | k}, P_{k | k}) ≜ E_{q} [log p (z_{k} | x_{k})]

, the gradients of g w. r. t.

x_{k | k}

and

P_{k | k}

are

\nabla_{x} g (x_{k | k}, P_{k | k}) = - α_{k},

(15)

\nabla_{P} g (x_{k | k}, P_{k | k}) = - γ_{k} .

(16)

The expectation

E_{q} [log p (z_{k} | x_{k})]

at time k is then maximized by gradient ascent in the variables

x_{k | k}

and

P_{k | k}

; that is,

\begin{matrix} E_{q} [log p (z_{k} | x_{k})] \approx α_{k} x_{k | k} + \frac{1}{2} γ_{k} P_{k | k}, \end{matrix}

(17)

where

α_{k}^{i}

and

γ_{k}^{i}

at iteration i are given by

α_{k}^{i} = - {(H_{k}^{i})}^{T} R_{k}^{- 1} (z_{k} - h_{k} (x_{k | k}^{i})),

(18)

γ_{k}^{i} = \frac{1}{2} {(H_{k}^{i})}^{T} R_{k}^{- 1} H_{k}^{i},

(19)

and

H_{k}^{i}

is the Jacobian matrix:

H_{k}^{i} = \frac{\partial h_{k} (x)}{\partial x^{T}} |_{x = x_{k | k}^{i}} .

(20)

In other words, the coefficients

α_{k}^{i}

and

γ_{k}^{i}

are updated by

H_{k}^{i}

with

x_{k | k}^{i}

in each iterative step. The detailed calculations of

α_{k}^{i}

and

γ_{k}^{i}

are given in Appendix A.

For Gaussian distributions

N (ξ_{1}; μ_{1}, C_{1})

and

N (ξ_{2}; μ_{2}, C_{2})

with the same dimension d, the KL divergence is

\begin{matrix} D_{KL} [N (ξ_{1}; μ_{1}, C_{1}) ∥ N (ξ_{2}; μ_{2}, C_{2})] = - \frac{1}{2} [log |C_{1} C_{2}^{- 1}| - tr (C_{1} C_{2}^{- 1}) - {(μ_{1} - μ_{2})}^{T} C_{2}^{- 1} (μ_{1} - μ_{2}) + d] . \end{matrix}

(21)

where operators

tr (\cdot)

and

|\cdot|

denote the trace and the determinant of a matrix, respectively. By Equations (17)–(21), Equation (13) can be rewritten as,

\begin{matrix} ψ_{k}^{i + 1} = & \underset{ψ_{k}}{arg max} \{α_{k}^{i} x_{k | k} + \frac{1}{2} γ_{k}^{i} P_{k | k} - \frac{1}{2} [log |P_{k | k - 1} {(P_{k | k}^{i})}^{- 1}| - tr (P_{k | k - 1} P_{k | k}^{- 1}) \\ - tr (P_{k | k} {(P_{k | k}^{i})}^{- 1}) - {(x_{k | k - 1} - x_{k | k})}^{T} P_{k | k}^{- 1} (x_{k | k - 1} - x_{k | k}) \\ - {(x_{k | k} - x_{k | k}^{i})}^{T} {(P_{k | k}^{i})}^{- 1} (x_{k | k} - x_{k | k}^{i})] - d\} . \end{matrix}

(22)

Then

ψ_{k}^{i + 1}

is maximized at a point

(x, P)

which can be explicitly calculated. To find them, set the partial derivatives of

ψ_{k}^{i + 1}

w.r.t.

{x, P}

to be zero, i.e.,

\frac{\partial ψ_{k}^{i + 1}}{\partial x} {|_{x = x_{k | k}^{i + 1}} = 0, \frac{\partial ψ_{k}^{i + 1}}{\partial P} |}_{P = P_{k | k}^{i + 1}} = 0 .

(23)

Accordingly,

x_{k | k}^{i + 1}

and

p_{k | k}^{i + 1}

are seen to be

\begin{matrix} x_{k | k}^{i + 1} & = {[\frac{β_{i}}{1 + β_{i}} P_{k | k - 1}^{- 1} + \frac{1}{1 + β_{i}} {(P_{k | k}^{i})}^{- 1}]}^{- 1} [\frac{β_{i}}{1 + β_{i}} (P_{k | k - 1}^{- 1} x_{k | k - 1} - α_{k}^{i}) + \frac{1}{1 + β_{i}} {(P_{k | k}^{i})}^{- 1} x_{k | k}^{i}] \\ = {[(1 - b_{i}) P_{k | k - 1}^{- 1} + b_{i} {(P_{k | k}^{i})}^{- 1}]}^{- 1} [(1 - b_{i}) (P_{k | k - 1}^{- 1} x_{k | k - 1} - α_{k}^{i}) + b_{i} {(P_{k | k}^{i})}^{- 1} x_{k | k}^{i}], \end{matrix}

(24)

\begin{matrix} P_{k | k}^{i + 1} & = {[\frac{1}{1 + β_{i}} {(P_{k | k}^{i})}^{- 1} + \frac{β_{i}}{1 + β_{i}} (P_{k | k - 1}^{- 1} + γ_{k}^{i})]}^{- 1} = {[b_{i} {(P_{k | k}^{i})}^{- 1} + (1 - b_{i}) (P_{k | k - 1}^{- 1} + γ_{k}^{i})]}^{- 1}, \end{matrix}

(25)

where

b_{i} = \frac{1}{1 + β_{i}}

. Equations (24) and (25) show that the state estimate and the associated covariance in the iteration are updated by

α_{k}^{i}

and

γ_{k}^{i}

, respectively. As shown in Figure 1, the complete iteration procedure consists of Equations (18), (19), (24) and (25).

We note that, in principle, the computational cost in Equations (24) and (25) can be slightly reduced by using the Matrix Inversion Lemma [27]. As a result, Equations (24) and (25) are derived as Equations (26) and (27), respectively.

x_{k | k}^{i + 1} = \{\frac{1}{b_{i}} P_{k | k}^{i} - \frac{1}{b_{i}} P_{k | k}^{i} [\frac{1}{1 - b_{i}} P_{k | k - 1} + \frac{1}{b_{i}} P_{k | k}^{i}] \frac{1}{b_{i}} P_{k | k}^{i}\} [(1 - b_{i}) (P_{k | k - 1}^{- 1} x_{k | k - 1} - α_{k}^{i}) + b_{i} {(P_{k | k}^{i})}^{- 1} x_{k | k}^{i}]

(26)

P_{k | k}^{i + 1} = \frac{1}{1 - b_{i}} K_{k}^{- 1} - \frac{b_{i}}{1 - b_{i}} K_{k}^{- 1} {(P_{k | k}^{i})}^{- 1} {(I_{4 \times 4} + \frac{b_{i}}{1 - b_{i}} K_{k}^{- 1} {(P_{k | k}^{i})}^{- 1})}^{- 1} \frac{1}{1 - b_{i}} K_{k}^{- 1},

(27)

where

K_{k}^{- 1} = {(P_{k | k - 1}^{- 1} + γ_{k}^{i})}^{- 1} = P_{k | k - 1}^{- 1} - P_{k | k - 1}^{- 1} γ_{k}^{i} (I_{4 \times 4} + P_{k | k - 1} γ_{k}^{i}) P_{k | k - 1}

.

For special case when

β_{i} = 1

, Equations (24) and (25) can be rewritten as

\begin{matrix} x_{k | k}^{i + 1} & = c_{1} x_{k | k - 1} + c_{2} x_{k | k}^{i} - c_{g} α_{k}^{i}, \end{matrix}

(28)

\begin{matrix} P_{k | k}^{i + 1} & = 2 {(c_{g}^{- 1} + γ_{k}^{i})}^{- 1}, \end{matrix}

(29)

where

c_{g} = {(P_{k | k - 1}^{- 1} + {(P_{k | k}^{i})}^{- 1})}^{- 1}

,

c_{1} = c_{g} P_{k | k - 1}^{- 1}

and

c_{2} = c_{g} {(P_{k | k - 1}^{i})}^{- 1}

.

The flow diagram of the proposed PEKF-VB algorithm is shown in Figure 1 and the detailed implementation of PEKF-VB is given in Algorithm 1.

Algorithm 1 The implementation of the PEKF-VB algorithm

1:: Initialization ( $k = 0$ ): state estimation $x_{0}$ and associated error covariance $P_{0}$ , the number of iterations.
2:: Compute the predicted state $x_{k | k - 1}$ and the associated error covariance $P_{k | k - 1}$

$\begin{matrix} x_{k | k - 1} = f_{k} (x_{k - 1 | k - 1}), P_{k | k - 1} = F_{k} P_{k - 1 | k - 1} F_{k}^{T} + Q_{k - 1}, \end{matrix}$

where $F_{k} = \frac{\partial f_{k} (x)}{\partial x} |_{x = x_{k - 1 | k - 1}}$ .
3:: Compute the filtering grain $K_{k}$

$S_{k} = H_{k} P_{k | k - 1} H_{k}^{T} + R_{k}, K_{k} = P_{k | k - 1} H_{k}^{T} {(S_{k})}^{- 1},$

where $H_{k} = \frac{\partial h_{k} (x)}{\partial x} |_{x = x_{k | k - 1}}$ .
4:: Update the state estimation $x_{k | k}$ and the associated error covariance $P_{k | k}$

$\begin{matrix} x_{k | k}^{*} = x_{k | k - 1} - K_{k} (z_{k} - h_{k} (x_{k | k - 1})), P_{k | k}^{*} = P_{k | k - 1} - K_{k} H_{k} P_{k | k - 1} . \end{matrix}$
5:: Let $x_{k | k}^{1} = x_{k | k}^{*}$ , $P_{k | k}^{1} = P_{k | k}^{*}$ , and $i = 1$ .
6:: while not converge do
7:: Compute parameters $α_{k}^{i + 1}$ and $γ_{k}^{i + 1}$ by Equations (18) and (19).
8:: Compute iterated state estimation $x_{k | k}^{i + 1}$ and its error covariance $P_{k | k}^{i + 1}$ by Equations (26) and (27).
9:: Let $i = i + 1$ .
10:: end while
11:: Let $k = k + 1$ , go back to Step 2.

4.3. Remarks

The VB method approximates the true posterior PDF by choosing from a parameterized variational distribution. In each iteration of the PEKF-VB, the ELBO (9) increases. It follows that the ELBO is a proper criterion for measuring the performance of variational optimization. The ELBO of the proposed nonlinear filter is

$\begin{matrix} L (ψ_{k}) = & - \frac{1}{2} log \frac{{(2 π)}^{D_{z}} | R_{k} | | P_{k | k - 1} |}{| P_{k | k} |} - \frac{1}{2} tr (P_{k | k - 1}^{- 1} [(x_{k | k} - x_{k | k - 1}) {(x_{k | k} - x_{k | k - 1})}^{T}]) \\ - \frac{1}{2} tr (P_{k | k - 1}^{- 1} P_{k | k}) - \frac{1}{2} tr (R_{k}^{- 1} ((z_{k} - H_{k} x_{k | k}) {(z_{k} - H_{k} x_{k | k})}^{T} + H_{k} P_{k | k} H_{k}^{T})) + \frac{D_{x}}{2}, \end{matrix}$

(30)

where $D_{x}$ and $D_{z}$ denote the dimension of the state and the dimension of the measurement, respectively. The derivation of the ELBO is given in Appendix B.
Apart from the KL divergence, we can use Calvo and Oller’s distance (COD) as the penalty function in Equation (13); the corresponding filter is denoted by CODEKF. The COD of two distributions $f (ξ_{1}) = p (ξ_{1} | μ_{1}, C_{1})$ and $f (ξ_{2}) = p (ξ_{2} | μ_{2}, C_{2})$ is [28],

$d (ξ_{1}, ξ_{2}) = {[1 / 2 \sum_{i = 1}^{n + 1} {ln}^{2} λ_{i}]}^{1 / 2},$

(31)

where n is the dimension of $ξ_{i}$ , $i \in {1, 2}$ , $λ_{i}$ are the eigenvalues of $f (Δ \bar{μ}, D)$ with $\bar{μ} = T^{T} C_{1}^{- 1 / 2} Δ μ$ , $D = T^{T} C_{1}^{- 1 / 2} C_{2} C_{1}^{- 1 / 2} T$ which is diagonal, and $Δ μ = μ_{2} - μ_{1}$ , $T T^{T} = I$ . We replace the KLD in Equation (13) with Equation (31) as follows.

$\begin{matrix} ψ_{k}^{i + 1} = & \underset{ψ_{k}}{arg max} \{(α_{k}^{i} x_{k | k} + \frac{1}{2} γ_{k}^{i} P_{k | k}) - D_{KL} [p (x_{k}) ∥ q (x_{k} | ψ_{k})] - \frac{1}{β_{i}} d (ξ_{1}, ξ_{2})\} . \end{matrix}$

(32)
Since both PEKF-VB and CODEKF involve iterations within the VB framework to minimize the divergence between the posterior PDF and variational distribution, their complexity is increased by the calculation of the Jacobian in each iteration.
In PEKF-VB, we use KL divergence to measure the similarity between two distributions. Under Gaussian assumptions for the distributions, a closed-form solution of the variational distribution has been derived. However, the VB framework with the KL divergence can also apply to non-Gaussian distributions. If no closed-form exists, a Monte Carlo method can be used to approximate the divergence. Other measures of dissimilarity between probability distributions, such as the alpha-divergence, the Rényi-divergence and the alpha-beta divergence, can also be used in the VB framework. See [29] and references therein. Unfortunately, in general, no computationally tractable form of the variational distribution can be derived and a Monte Carlo method has to be employed.

5. Numerical Simulations

In this section, we present two nonlinear estimation examples of 2D target tracking and a benchmark nonlinear filtering problem to illustrate the performance of PEKF-VB and CODEKF. We compare them with EKF and UKF. The performance is measured by root-mean-squared error (RMSE) of the estimates and the computational overhead.

Example 1: Range-bearing tracking. In this scenario, the underlying target motion is described by a constant turn (CT) model, with the state vector consists of 2D position and velocity components. As shown in Figure 2a, the target moves with initial state

x_{0} = {(565.7, 29.99, 1166, - 0.62)}^{T}

along a circular trajectory, and is observed by a range-bearing sensor. The state transition matrix

F_{k}

in Equation (1) and the measurement function

h_{k} (x_{k})

in Equation (2) are:

F_{k} = [\begin{matrix} 1 & sin (\dot{θ} T) / \dot{θ} & 0 & (cos (\dot{θ} T) - 1) / \dot{θ} \\ 0 & cos (\dot{θ} T) & 0 & - sin (\dot{θ} T) \\ 0 & (1 - cos (\dot{θ} T)) / \dot{θ} & 1 & sin (\dot{θ} T) / \dot{θ} \\ 0 & sin (v \dot{θ} T) & 0 & cos (\dot{θ} T) \end{matrix}],

(33)

h_{k | k} (x_{k}) = [\begin{matrix} \sqrt{(x^{2} + y^{2})} \\ artan (y / x) \end{matrix}],

(34)

where the turn rate

\dot{θ} = - 0.0333 rad / s

, the covariances of the zero mean Gaussian white noises

ω_{k}

and

υ_{k}

are

Q_{k} = 1.5 G q G^{T}

and

R_{k} = diag (r^{2}, ϕ^{2})

, respectively, where

q = I_{2 \times 2}

, and

G = {[\begin{matrix} T^{2} / 2 & T & 0 & 0 \\ 0 & 0 & T^{2} / 2 & T \end{matrix}]}^{T}

,

r = 35

, and

ϕ = 0.5 π / 180

. At each run, the track is initialised using the two-point method [1] with initial error covariance

P_{0} = diag ([600, 100, 600, 100])

. We let

β_{i} = 1

, and

T = 1 s

. The target moves for 80 scans (periods). 1000 Monte Carlo runs are carried out.

The RMSE plots of EKF, UKF, CODEKF and PEKF-VB are showed in Figure 2b. The black, red, green and blue curves are obtained by the PEKF-VB CODEKF, UKF and EKF, respectively. It is seen that in terms of RMSE performance, PEKF-VB is slightly better than CODEF; both PEKF-VB and CODEF are better than EKF and UKF. Table 1 provides the quantitative comparison of RMSE and execution time of EKE, UKF and CODEKF, PEKF-VB. Figure 3b gives the relationship between iteration index and the running time of PEKF-VB.

In Figure 3a we compare the RMSE performance of PEKF-VB by varying the numbers of iterations. Notice that the RMSE decreases with the increasing of the number of iterations. Figure 3b gives the computational overhead of PEKF-VB w.r.t the number of iterations. Both results agree with our intuition.

To illustrate the convergence of PEKF-VB, we present the ELBO for different numbers of iterations in Figure 4. Figure 4a illustrates the ELBO at the second scan, and Figure 4b shows the ELBO for all scans. Clearly, the ELBO increases with the number of iterations increases, showing that the iteration procedure in PEKF-VB converges.

Example 2: Bearing-only tracking. In this scenario, a single target tracking using measurements from a single bearing-only sensor is considered. While the sensor (ownership) measurement model is nonlinear to the target state, the sensor has to maneuver relative to the target in order to observe it [30,31]. Let

x_{k} = [x_{k}, y_{k}, {\dot{x}}_{k}, {\dot{y}}_{k}]

be the state of the target at time k, where

(x_{k}, y_{k})

and

({\dot{x}}_{k}, {\dot{y}}_{k})

are the position and velocity, respectively. The target, with an initial range of 5 km (relative to the sensor) and initial course of 220° in a clockwise direction (Set the positive axis of y is 0°), is modeled by a constant velocity model in Equation (35).

F_{k} = [\begin{matrix} 1 & 0 & T & 0 \\ 0 & 1 & 0 & T \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}],

(35)

where

T = 1 \min

. The speed of the target is

0.1235

km/min. The sensor starts moving with a fixed speed of

0.1543

km/min and an initial course of 140° in a clockwise direction (Set the positive axis of y is 0°). Please note that for the bearing-only tracking problem, to be able to estimate the range of the target, the sensor has to maneuver. Here we assume the sensor maneuvers from scan 14 to scan 17, and then moves with constant velocity from scan 18 to scan 40. The motion model of the sensor is given by Equation (36), where the turn rate

\dot{θ} = 30 ° / \min

. Both the target and the sensor move for 40 scans and their trajectories are shown in Figure 5a.

\{\begin{matrix} S_{k} = [\begin{matrix} 1 & 0 & T & 0 \\ 0 & 1 & 0 & T \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}], & 0 < k < 14, 18 \leq k \leq 40, \\ S_{k} = [\begin{matrix} 1 & 0 & sin (\dot{θ} T) / \dot{θ} & (cos (\dot{θ} T) - 1) / \dot{θ} \\ 0 & 1 & (1 - cos (\dot{θ} T)) / \dot{θ} & sin (\dot{θ} T) / \dot{θ} \\ 0 & 0 & cos (\dot{θ} T) & - sin (\dot{θ} T) \\ 0 & 0 & sin (\dot{θ} T) & cos (\dot{θ} T) \end{matrix}], & 14 \leq k < 18 . \end{matrix}

(36)

The measurement function

h_{k} (x_{k})

of the bearing-only sensor is,

h_{k} (x_{k}) = atan ((x_{k}^{'} - x_{s}) / (y_{k}^{'} - y_{s})),

(37)

where

x_{k}^{'}

and

y_{k}^{'}

are the position of the target in Cartesian coordinate system,

x_{s}

and

y_{s}

are the position of the sensor. The standard deviation of measurement noise is 1°. The initial position of the target is randomly sampled at range

r = 13 km

with covariance

P_{0}

P_{0} = A B A^{'},

(38)

where

A = [\begin{matrix} cos \dot{θ} & sin \dot{θ} & 0 & 0 \\ - sin \dot{θ} & cos \dot{θ} & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}], B = [\begin{matrix} r^{2} σ_{m}^{2} & 0 & 0 & 0 \\ 0 & Δ r^{2} & 0 & o \\ 0 & 0 & Δ v^{2} & 0 \\ 0 & 0 & 0 & Δ v^{2} \end{matrix}] .

(39)

where

σ_{m} = π / 180 rad

,

Δ r = 2 km

and

Δ v = 61.7 m / \min

. We let

β_{i} = 1

. Figure 5b shows the estimated target trajectories obtained by the EKF, UKF, CODEKF and PEKF-VB for a single run.

Based on 1000 Monte Carlo simulations, the RMSE comparison of EKF, UKF, CODEKF and PEKF-VB is illustrated in Figure 6, where the number of iterations for PEKF-VB is 5.

As we expected, with a fixed sensor trajectory, PEKF-VB has the best target observability from sensor measurements and leads to a better RMSE performance than EKF and UKF. Under the VB framework, the variational distribution approaches the real posterior PDF through the iteration of the proximal filter.
The RMSE performance of CODEKF is also better than those of EKF and UKF because, for CODEKF, the Jacobian matrix of Equation (37) in each iteration is updated to minimize the COD. However, the RMSE performance of CODEKF is slightly worse than that of PEKF-VB.
In the first few scans, the performance of the four filters are comparable. This is because, in this bearing-only tracking problem, the accumulative measurements in these scans do not provide enough information to the four filters. The performance of CODEKF and of PEKF-VB suffers when measurement data is very limited. As more measurements accumulate both CODEKF and PEFK-VB extract more information via the iteration process, resulting in superior performance.

Figure 7 shows the metric values versus the number of iterations for PEKF-VB and CODEKF. The KLD curve is close to zero after the fifth iteration. The enlarged plots marked from the sixth to the tenth iteration shows that PEKF-VB converge faster than CODEKF.

Example 3: A Strongly nonlinear filtering problem. For further verifying our proposed method, we run EKF, UKF, CODEDK and PEKF-VB on the benchmark nonlinear problem in [32,33,34]

\begin{matrix} x_{k} = & \frac{x_{k - 1}}{2} + \frac{25 x_{k - 1}}{1 + x_{k - 1}^{2}} + 8 \cos (1.2 k) + ν_{k - 1}, \end{matrix}

(40)

\begin{matrix} z_{k} = & \frac{{x_{k}}^{2}}{20} + ω_{k}, \end{matrix}

(41)

where

ν_{k - 1}

and

ω_{k}

are zero mean Gaussian noise with variances

Q_{k - 1}

and

R_{k}

, respectively. We let

Q_{k} = 0.0001

,

R_{k} = 1

and scan period

T = 1

. The simulation results are given in Figure 8, from which we can see that CODEKF and PEKF-VB have very similar results and outperform EKF and UKF. This is because that local linearization adopted by EKF-based filter are not a sufficient description of the nonlinear nature of this example [34], while the VB iteration can make use of measurement information as much as possible.

6. Conclusions

We have developed a proximal iterative nonlinear filter, in which the expectation of the posterior PDF is approximated by a parameterised variational distribution that is iteratively optimized in the VB framework. A weighted KL divergence is adopted as a proximal term in the iteration to ensure the convergence can be achieved with a tight bound. The simulation results show that the proposed algorithm is better than several existing algorithms in terms of estimation accuracy at the cost of increased computational burden.

Author Contributions

Y.H. performed this research and Q.P. is the advisor. Conceptualization, H.L.; methodology, Y.M., X.W., H.L.; software, Y.H. and X.W.; writing-original draft preparation, Y.H.; writing-review and editing, Z.W., B.M.; supervision, H.L., Z.W., Q.P.; funding acquisition, Q.P.

Funding

This work was supported by Excellent Chinese and Foreign Youth Exchange Programme of China Association for Science and Technology (No. 2017CASTQNJL046) and National Natural Science Foundation of China (No. 61790552, 61501378, 61503305, 61873211).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Derivations of $α_{k}^{i}$ and $γ_{k}^{i}$

Since the system is Gaussian, the likelihood function

p (z_{k} | x_{k | k}^{i})

is

\begin{matrix} p (z_{k} | x_{k | k}^{i}) & = \frac{1}{\sqrt{{(2 π)}^{1 / D_{z}} | R_{k} |}} exp (- \frac{1}{2} {(z_{k} - h_{k} (x_{k | k}^{i}))}^{T} R_{k}^{- 1} (z_{k} - h_{k} (x_{k | k}^{i}))) . \end{matrix}

(A1)

The parameters

α_{k}

in Equation (18) and

γ_{k}^{i}

in Equation (19) are derived according to Equation (A2) and Equation (A3), respectively.

\begin{matrix} α_{k}^{i} & = - \nabla_{x} E_{q} [log p (z_{k} | x_{k | k}^{i})] \\ = - \nabla_{x} E_{q} [log (\frac{1}{\sqrt{{(2 π)}^{1 / D_{z}} | R_{k} |}} exp (- \frac{1}{2} {(z_{k} - h (x_{k | k}^{i}))}^{T} R_{k}^{- 1} (z_{k} - h (x_{k | k}^{i}))))] . \end{matrix}

(A2)

\begin{matrix} γ_{k}^{i} & = - \nabla_{P} E_{q} [log p (z_{k} | x_{k | k}^{i})] \\ = - \nabla_{P} E_{q} [log (\frac{1}{\sqrt{{(2 π)}^{1 / D_{z}} | R_{k} |}} exp (- \frac{1}{2} {(z_{k} - h (x_{k | k}^{i}))}^{T} R_{k}^{- 1} (z_{k} - h (x_{k | k}^{i}))))] . \end{matrix}

(A3)

By Bonnet’s theorem [35], the gradient of the expectation of

f (ξ)

under a Gaussian distribution

N (ξ | μ, C)

w.r.t. the mean

μ

is the expectation of the gradient of

f (ξ)

\nabla_{μ} E_{N (ξ | μ, C)} [f (ξ)] = E_{N (ξ | μ, C)} [\nabla_{ξ} f (ξ)] .

(A4)

It follows that

α_{k}^{i}

can be written

\begin{matrix} α_{k}^{i} & = \frac{1}{2} E_{q} [\nabla_{x} ({(z_{k} - h (x_{k | k}^{i}))}^{T} R_{k}^{- 1} (z_{k} - h (x_{k | k}^{i})))] \\ = - {(H_{k}^{i})}^{T} R_{k}^{- 1} (z_{k} - h (x_{k | k}^{i})), \end{matrix}

(A5)

where the Jacobian matrix

H_{k}^{i} = \frac{\partial h_{k} (x)}{\partial x^{T}} |_{x = x_{k | k}^{i}}

.

According to Price’s theorem [36], the gradient of the expectation of

f (ξ)

under a Gaussian distribution

N (ξ | μ, C)

w.r.t. the covariance C is the expectation of the Hessian of

f (ξ)

:

\nabla_{c} E_{N (ξ | μ, C)} [f (ξ)] = \frac{1}{2} E_{N (ξ | μ, C)} [\nabla_{ξ, ξ} f (ξ)] .

(A6)

Similarly,

\begin{matrix} γ_{k}^{i} & = - \frac{1}{2} E_{q} [\nabla_{x, x} ({(z_{k} - h (x_{k | k}^{i}))}^{T} R_{k}^{- 1} (z_{k} - h (x_{k | k}^{i})))] \\ = - \frac{1}{2} E_{q} [\nabla_{x} ({(H_{k}^{i})}^{T} ({(R_{k}^{- 1})}^{T} + R_{k}^{- 1}) (z_{k} - h (x_{k | k}^{i})))] \\ = \frac{1}{2} {(H_{k}^{i})}^{T} R_{k}^{- 1} H_{k}^{i} . \end{matrix}

(A7)

Appendix B. Derivation of ELBO

From Equations (9) and (8), the ELBO is

L (ψ_{k}) = E_{q} [log p (z_{k} | x_{k})] + E_{q} [log p (x_{k})] - E_{q} [log q (x_{k} | ψ_{k})] .

(A8)

Since both system noise and measurement noise are Gaussian, the likelihood function

p (y_{k} | x_{k})

, the PDF

p (x_{k})

and the PDF

q (x_{k} | ψ_{k})

below are Gaussian

\begin{matrix} p (z_{k} | x_{k}) = & \frac{1}{\sqrt{{(2 π)}^{D_{z}} | R_{k}} |} exp (- \frac{1}{2} {(z_{k} - h (x_{k}))}^{T} {R_{k}}^{- 1} (z_{k} - h (x_{k}))), \end{matrix}

(A9)

\begin{matrix} p (x_{k}) = \frac{1}{\sqrt{{(2 π)}^{D_{x}} | Q_{k}} |} exp (- \frac{1}{2} {(x_{k} - f_{k} (x_{k - 1}))}^{T} Q_{k}^{- 1} (x_{k} - f_{k} (x_{k - 1}))), \end{matrix}

(A10)

\begin{matrix} q (x_{k} | ψ_{k}) & = \frac{1}{\sqrt{{(2 π)}^{D_{x}} | P_{k | k}} |} exp (- \frac{1}{2} {(x_{k} - x_{k | k})}^{T} P_{k | k}^{- 1} (x_{k} - x_{k | k})) . \end{matrix}

(A11)

Assume that the state transition matrix and the measurement matrix are obtained by linearization,

p (z_{k} | x_{k})

and

p (x_{k})

can be written as,

\begin{matrix} p (z_{k} | x_{k}) \approx & \frac{1}{\sqrt{{(2 π)}^{D_{z}} | R_{k}} |} exp (- \frac{1}{2} {(z_{k} - H_{k} x_{k})}^{T} {R_{k}}^{- 1} (z_{k} - H_{k} x_{k})), \end{matrix}

(A12)

\begin{matrix} p (x_{k}) \approx \frac{1}{\sqrt{{(2 π)}^{D_{x}} | Q_{k}} |} exp (- \frac{1}{2} {(x_{k} - F_{k - 1} x_{k - 1})}^{T} Q_{k}^{- 1} (x_{k} - F_{k - 1} x_{k - 1})) . \end{matrix}

(A13)

where

F_{k - 1} = \frac{\partial f (x)}{\partial x} |_{x = x_{k - 1 | k - 1}}

and

H_{k} = \frac{\partial h (x)}{\partial x} |_{x = x_{k | k}}

.

Assume that the state estimation is unbiased; that is,

E_{q} [x_{k - 1}] = x_{k - 1 | k - 1}

, Equation (A10) can be written approximately as

\begin{matrix} p (x_{k}) & = \frac{1}{\sqrt{{(2 π)}^{D_{x}} | F_{k - 1} P_{k - 1 | k - 1} F_{k - 1}^{T} + Q_{k - 1} |}} \\ \times exp (- \frac{1}{2} {(x_{k} - F_{k - 1} x_{k - 1 | k - 1})}^{T} {(F_{k - 1} P_{k - 1 | k - 1} F_{k - 1}^{T} + Q_{k - 1})}^{- 1} (x_{k} - F_{k - 1} x_{k - 1 | k - 1})) \\ = \frac{1}{\sqrt{{(2 π)}^{D_{x}} | P_{k | k - 1}} |} exp (- \frac{1}{2} {(x_{k} - x_{k | k - 1})}^{T} P_{k | k - 1}^{- 1} (x_{k} - x_{k | k - 1})) . \end{matrix}

(A14)

Since estimation is unbiased, the ground truth

x_{k}

at time k can be expressed as,

\begin{matrix} x_{k} = x_{k | k} + {\tilde{x}}_{k | k}, \end{matrix}

(A15)

where

E_{q} [{\tilde{x}}_{k | k}] = 0

. The first term

E_{q} [log p (z_{k} | x_{k})]

in Equation (A8) becomes

\begin{matrix} E_{q} [log p (z_{k} | x_{k})] \approx & - \frac{1}{2} log ({(2 π)}^{D_{z}} | R_{k} |) \\ - \frac{1}{2} tr (R_{k}^{- 1} E_{q} [(z_{k} - H_{k} (x_{k | k} + {\tilde{x}}_{k | k})) {(z_{k} - H_{k} (x_{k | k} + {\tilde{x}}_{k | k}))}^{T}]) \\ = & - \frac{1}{2} log ({(2 π)}^{D_{z}} | R_{k} |) \\ - \frac{1}{2} tr (R_{k}^{- 1} ((z_{k} - H_{k} x_{k | k}) {(z_{k} - H_{k} x_{k | k})}^{T} + H_{k} P_{k | k} H_{k}^{T})) . \end{matrix}

(A16)

The second term

E_{q} [log p (x_{k})]

in Equation (A8) becomes

\begin{matrix} E_{q} [log p (x_{k})] \approx & - \frac{1}{2} log ({(2 π)}^{D_{x}} | P_{k | k - 1} |) - \frac{1}{2} tr (P_{k | k - 1}^{- 1} E_{q} [(x_{k} - x_{k | k - 1}) {(x_{k} - x_{k | k - 1})}^{T}]) \\ = & - \frac{1}{2} log ({(2 π)}^{D_{x}} | P_{k | k - 1} |) - \frac{1}{2} tr (P_{k | k - 1}^{- 1} E_{q} [(x_{k | k} + {\tilde{x}}_{k}) {(x_{k | k} + {\tilde{x}}_{k})}^{T} \\ - (x_{k | k} + {\tilde{x}}_{k}) x_{k | k - 1}^{T} - x_{k | k - 1} {(x_{k | k} + {\tilde{x}}_{k})}^{T} + x_{k | k - 1} x_{k | k - 1}^{T}]) \\ = & - \frac{1}{2} log ({(2 π)}^{D_{x}} | P_{k | k - 1} |) - \frac{1}{2} tr (P_{k | k - 1}^{- 1} P_{k | k}) \\ - \frac{1}{2} tr (P_{k | k - 1}^{- 1} E_{q} [x_{k | k} x_{k | k}^{T} - x_{k | k} x_{k | k - 1}^{T} - x_{k | k - 1} x_{k | k}^{T} + x_{k | k - 1} x_{k | k - 1}^{T}]) \\ = & - \frac{1}{2} log ({(2 π)}^{D_{x}} | P_{k | k - 1} |) - \frac{1}{2} tr (P_{k | k - 1}^{- 1} P_{k | k}) \\ - \frac{1}{2} tr (P_{k | k - 1}^{- 1} ((x_{k | k} - x_{k | k - 1}) {(x_{k | k} - x_{k | k - 1})}^{T})) . \end{matrix}

(A17)

The third term

E_{q} [log q (x_{k} | ψ_{k})]

in Equation (A8) is

\begin{matrix} E_{q} [log q (x_{k} | ψ_{k})] & \approx - \frac{1}{2} log ({(2 π)}^{D_{x}} | P_{k | k} |) - \frac{1}{2} tr (P_{k | k}^{- 1} E_{q} [(x_{k} - x_{k | k}) {(x_{k} - x_{k | k})}^{T}]) \\ = - \frac{1}{2} log ({(2 π)}^{D_{x}} | P_{k | k} |) - \frac{D_{x}}{2}, \end{matrix}

(A18)

Combining Equations (A16)–(A18), we obtain the ELBO as

\begin{matrix} L (ψ_{k}) = & - \frac{1}{2} log \frac{{(2 π)}^{D_{z}} | R_{k} | | P_{k | k - 1} |}{| P_{k | k} |} - \frac{1}{2} tr (P_{k | k - 1}^{- 1} [(x_{k | k} - x_{k | k - 1}) {(x_{k | k} - x_{k | k - 1})}^{T}]) \\ - \frac{1}{2} tr (P_{k | k - 1}^{- 1} P_{k | k}) - \frac{1}{2} tr (R_{k}^{- 1} ((z_{k} - H_{k} x_{k | k}) {(z_{k} - H_{k} x_{k | k})}^{T} + H_{k} P_{k | k} H_{k}^{T})) + \frac{D_{x}}{2} . \end{matrix}

(A19)

References

Barshalom, Y.; Li, X.R. Estimation and Tracking: Principles, Techniques, and Software; Artech House: Norhood, MA, USA, 1996. [Google Scholar]
Van Trees, H.L. Detection, Estimation, and Modulation Theory Part 1; John Wiley & Sons, Inc.: New York City, NY, USA, 2003. [Google Scholar]
Farina, A.; Ristic, B.; Benvenuti, D. Tracking a ballistic target: Comparison of several nonlinear filters. IEEE Trans. Aerosp. Electron. Syst. 2002, 38, 854–867. [Google Scholar] [CrossRef]
Hu, J.; Wang, Z.; Gao, H.; Stergioulas, L.K. Extended Kalman filtering with stochastic nonlinearities and multiple missing measurements. Automatica 2012, 48, 2007–2015. [Google Scholar] [CrossRef] [Green Version]
Hu, X.; Bao, M.; Zhang, X.; Guan, L.; Hu, Y. Generalized iterated Kalman filter and its performance evaluation. IEEE Trans. Signal Process. 2015, 63, 3204–3217. [Google Scholar] [CrossRef]
García-Fernández, Á.F.; Svensson, L.; Morelande, M.R.; Särkkä, S. Posterior linearization filter: Principles and implementation using sigma points. IEEE Trans. Signal Process. 2015, 63, 5561–5573. [Google Scholar]
Tronarp, F.; Garcia-Fernandez, F.; Särkkä, S. Iterative filtering and smoothing in non-linear and non-Gaussian systems using conditional moments. IEEE Signal Process. Lett. 2018, 25, 408–412. [Google Scholar] [CrossRef]
Khan, Z.; Balch, T.; Dellaert, F. MCMC-based particle filtering for tracking a variable number of interacting targets. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1805–1819. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Merwe, R.V.D.; Doucet, A.; Freitas, N.D.; Wan, E. The unscented particle filter. In Proceedings of the International Conference on Neural Information Processing Systems, Denver, CO, USA, 4–6 December 2000; pp. 563–569. [Google Scholar]
Cappe, O.; Godsill, S.J.; Moulines, E. An overview of existing methods and recent advances in sequential Monte Carlo. Proc. IEEE 2007, 95, 899–924. [Google Scholar] [CrossRef]
Julier, S.; Uhlmann, J.; Durrantwhyte, H.F. A new method for nonlinear transformation of means and covariances in filters and estimates. IEEE Trans. Autom. Control 2000, 45, 477–482. [Google Scholar] [CrossRef]
Gustafsson, F.; Hendeby, G. Some relations between extended and unscented Kalman filters. IEEE Trans. Signal Process. 2012, 60, 545–555. [Google Scholar] [CrossRef]
Arasaratnam, I.; Haykin, S. Cubature Kalman filters. IEEE Trans. Autom. Control 2009, 54, 1254–1269. [Google Scholar] [CrossRef]
Arasaratnam, I.; Haykin, S.; Hurd, T.R. Cubature Kalman filtering for continuous-discrete systems: Theory and simulations. IEEE Trans. Signal Process. 2010, 58, 4977–4993. [Google Scholar] [CrossRef]
Merwe, R.V.D.; Wan, E.A. Sigma-point Kalman filters for integrated navigation. In Proceedings of the 60th Annual Meeting of the Institute of Navigation, Dayton, OH, USA, 7–9 June 2004; pp. 641–654. [Google Scholar]
Gultekin, S.; Paisley, J. Nonlinear Kalman filtering with divergence minimization. IEEE Trans. Signal Process. 2017, 65, 6319–6331. [Google Scholar] [CrossRef]
Beal, M.J. Variational Algorithms for Approximate Bayesian Inference. Ph.D. Thesis, Cambridge University, Cambridge, UK, 2003. [Google Scholar]
Khan, M.E.; Baqué, P.; Fleuret, F.; Fua, P. Kullback-Leibler proximal variational inference. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 3402–3410. [Google Scholar]
Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc. 2017, 112, 1–32. [Google Scholar] [CrossRef]
Bishop, C.M. Pattern Pecognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
Salimans, T.; Kingma, D.; Welling, M. Markov chain Monte Carlo and variational inference: Bridging the gap. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1218–1226. [Google Scholar]
Mnih, A.; Rezende, D. Variational inference for Monte Carlo objectives. In Proceedings of the International Conference on Machine Learning, New York City, NY, USA, 19–24 June 2016; pp. 2188–2196. [Google Scholar]
Sain, R.; Mittal, V.; Gupta, V. A comprehensive review on recent advances in variational Bayesian inference. In Proceedings of the International Conference on Advances in Computer Engineering and Applications, Ghaziabad, India, 19–20 March 2015; pp. 488–492. [Google Scholar]
Tseng, P. An analysis of the EM algorithm and entropy-like proximal point methods. Math. Oper. Res. 2004, 29, 27–44. [Google Scholar] [CrossRef]
Chrétien, S.; Hero, A.O. Kullback proximal algorithms for maximum-likelihood estimation. IEEE Trans. Inf. Theory 2000, 46, 1800–1810. [Google Scholar] [CrossRef]
Khan, M.E.; Babanezhad, R.; Lin, W.; Schmidt, M.; Sugiyama, M. Convergence of proximal-gradient stochastic variational inference under non-decreasing step-size sequence. J. Comp. Neurol. 2015, 319, 359–386. [Google Scholar]
Petersen, K.B.; Pedersen, M.S. The Matrix Cookbook; Technical University of Denmark: Denmark, Copenhagen, 2012. [Google Scholar]
Calvo, M.; Oller, J.M. A distance between multivariate normal distributions based in an embedding into the Siegel group. J. Multivar. Anal. 1990, 35, 223–242. [Google Scholar] [CrossRef]
Regli, J.B.; Silva, R. Alpha-Beta Divergence For Variational Inference. arXiv, 2018; arXiv:1805.01045. [Google Scholar]
Arulampalam, M.S.; Ristic, B.; Gordon, N.; Mansell, T. Bearings-only tracking of manoeuvring targets using particle filters. EURASIP J. Adv. Signal Process. 2004, 2004, 1–15. [Google Scholar] [CrossRef]
Wang, X.; Morelande, M.; Moran, B. Target motion analysis using single sensor bearings-only measurements. In Proceedings of the International Congress on Image and Signal Processing, Tianjin, China, 17–19 October 2009; pp. 2094–2099. [Google Scholar]
Gordon, N.J.; Salmond, D.J.; Smith, A.F.M. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F-Radar Signal Process. 1993, 140, 107–113. [Google Scholar] [CrossRef]
Gustafsson, F. Particle filter theory and practice with positioning applications. IEEE Trans. Aerosp. Electron. Syst. Mag. 2010, 25, 53–82. [Google Scholar] [CrossRef] [Green Version]
Arulampalam, S.; Maskell, S.; Gordon, N.; Tim, C. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 2002, 50, 174–188. [Google Scholar] [CrossRef] [Green Version]
Rezende, D.J.; Mohamed, S.; Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the Machine Learning Research, Beijing, China, 21–26 June 2014; pp. 1278–1286. [Google Scholar]
Price, R. A useful theorem for nonlinear devices having Gaussian inputs. IEEE Trans. Inf. Theory 1958, 4, 69–72. [Google Scholar] [CrossRef]

Figure 1. The flow diagram of the proposed PEKF-VB algorithm.

Figure 2. Tracking with a CT model (a) Trajectory, measurements and estimation (b) RMSE obtained by EKF, UKF, CODEKF and PEKF-VB.

Figure 3. RMSE and running time of PEKF-VB (a) RMSE with different number of iterations (b) The relationship between iteration and mean RMSE, running time.

Figure 4. The curve in (a) is the ELBO of the second scan. The curves in (b) are the ELBO of all scans.

Figure 5. Bearing-only tracking. (a) Tracking scenario (b) Estimation obtained by EKF, UKF, CODEKF and PEKF-VB.

Figure 6. RMSE comparison of EKF, UKF, CODEKF and PEKF-VB in bearing-only tracking. (a) RMSE in position (b) RMSE in velocity.

Figure 7. Value of metrics versus the number of iterations for PEKF-VB and CODEKF. (a) KL divergence in PEKF-VB, (b) COD in CODEKF.

Figure 8. Comparison of EKF, UKF, CODEKF and PEKF-VB (a) Filtering results (b) RMSE.

Table 1. The quantitative comparison of the mean RMSE (from 20 to 80 scan) and the running time.

Algorithm	EKF	UKF	CODEKF	PEKF-VB
Time ratio	1	3.64	3.96	6.76
Mean RMSE	17.8129	17.8455	16.7758	16.3958

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, Y.; Wang, X.; Lan, H.; Wang, Z.; Moran, B.; Pan, Q. An Iterative Nonlinear Filter Using Variational Bayesian Optimization. Sensors 2018, 18, 4222. https://doi.org/10.3390/s18124222

AMA Style

Hu Y, Wang X, Lan H, Wang Z, Moran B, Pan Q. An Iterative Nonlinear Filter Using Variational Bayesian Optimization. Sensors. 2018; 18(12):4222. https://doi.org/10.3390/s18124222

Chicago/Turabian Style

Hu, Yumei, Xuezhi Wang, Hua Lan, Zengfu Wang, Bill Moran, and Quan Pan. 2018. "An Iterative Nonlinear Filter Using Variational Bayesian Optimization" Sensors 18, no. 12: 4222. https://doi.org/10.3390/s18124222

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Iterative Nonlinear Filter Using Variational Bayesian Optimization

Abstract

1. Introduction

2. Problem Formulation

3. Evidence Lower Bound Maximization

4. Proximal Iterative Nonlinear Filter

4.1. Penalty Function Based on KL Divergence

4.2. The Proximal Iterative Nonlinear Filter

4.3. Remarks

5. Numerical Simulations

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A. Derivations of $α_{k}^{i}$ and $γ_{k}^{i}$

Appendix B. Derivation of ELBO

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

An Iterative Nonlinear Filter Using Variational Bayesian Optimization

Abstract

1. Introduction

2. Problem Formulation

3. Evidence Lower Bound Maximization

4. Proximal Iterative Nonlinear Filter

4.1. Penalty Function Based on KL Divergence

4.2. The Proximal Iterative Nonlinear Filter

4.3. Remarks

5. Numerical Simulations

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A. Derivations of α k i and γ k i

Appendix B. Derivation of ELBO

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix A. Derivations of $α_{k}^{i}$ and $γ_{k}^{i}$