Bayes Filtering and State Estimation

7 minute read ·

Published:

In robot state estimation, the Bayes filter is a probabilistic approach that estimates the state from a sequence of controls and measurements by recursively performing prediction with a motion model and correction with a measurement model. In practice, some common instantiations of the Bayes framework are the Kalman filter, information filter, and particle filter.

Mathematical Formulation of Bayes Filter

Given a sequence of measurements z1:t=z1,,zt\mathrm{z}_{1:t}=\mathbf{z}_1,\dots,\mathbf{z}_t and controls u1:t=u1,,ut\mathbf{u}_{1:t}=\mathbf{u}_1,\dots,\mathbf{u}_t, and a measurement model p(ztxt)p(\mathbf{z}_t\mid\mathbf{x}_t) and a motion model p(xtxt1,ut)p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{u}_t). The Bayes filter provides a recursive framework for estimating the state xt\mathbf{x}_t in terms of a probability distribution function, known as the belief bel(xt)p(xtz1:t,u1,t)\mathrm{bel}(\mathbf{x}_t)\triangleq p(\mathbf{x}_t\mid\mathbf{z}_{1:t},\mathbf{u}_{1,t}), or posterior.

The dynamic Bayes network that characterizes the evolution of controls, states, and measurements. [1]

The filtering algorithm is built upon two assumptions of conditional independence:

  • Motion model is (1st-order) Markov:
p(xtx0:t1,u1:t)=p(xtxt1,ut)p(\mathbf{x}_t\mid\mathbf{x}_{0:t-1},\mathbf{u}_{1:t})=p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{u}_t)
  • The measurement model is memoryless:
p(ztx0:t,z1:t1,u1:t)=p(ztxt)p(\mathbf{z}_t\mid\mathbf{x}_{0:t},\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})=p(\mathbf{z}_{t}\mid\mathbf{x}_t)

Given these two assumptions, we can propagate the belief over time:

bel(xt)=p(xtz1:t,u1:t)=p(xtzt,z1:t1,u1:t)definition=ηp(ztxt,z1:t1,u1:t)p(xtz1:t1,u1:t)Bayes rule=ηp(ztxt)p(xtz1:t1,u1:t)memoryless\begin{align*} \mathrm{bel}(\mathbf{x}_t) &=p(\mathbf{x}_t\mid\mathbf{z}_{1:t},\mathbf{u}_{1:t})=p(\mathbf{x}_t\mid\mathbf{z}_t,\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) && \to\text{definition} \\ &=\eta p(\mathbf{z}_t\mid\mathbf{x}_t,\mathbf{z}_{1:t-1}, \mathbf{u}_{1:t})p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) && \to\text{Bayes rule}\\ &=\eta p(\mathbf{z}_t\mid\mathbf{x}_t)p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) && \to \text{memoryless} \end{align*}

We can define the predicted belief of the state at time tt to be the probability distribution before knowing the current measurement, i.e., bel(xt)p(xtz1:t1,u1:t)\overline{\mathrm{bel}}(\mathbf{x}_t)\triangleq p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}). Then, the above equation becomes

bel(xt)=ηp(ztxt)bel(xt)\begin{equation} \mathrm{bel}(\mathrm{x}_t)=\eta p(\mathbf{z}_t\mid\mathbf{x}_t)\overline{\mathrm{bel}}(\mathbf{x}_t) \end{equation}

where η\eta is the normalization coefficient, and

1η=p(ztxt)bel(xt)dxt\frac{1}{\eta}=\int p(\mathbf{z}_t\mid\mathbf{x}_t)\overline{\mathrm{bel}}(\mathbf{x}_t)\mathrm{d}\mathbf{x}_t

which is essentially a correction applied to the belief after observing the current measurement.

What remains to be done, obviously, is to predict the belief bel(xt)\overline{\mathrm{bel}}(\mathbf{x}_t) from the previous state:

bel(xt)=p(xtz1:t1,u1:t)=p(xtxt1,z1:t1,u1:t)p(xt1z1:t1,u1:t)dxt1marginalizext1=p(xtxt1,ut)p(xt1z1:t1,u1:t1)dxt1Markov assumption\begin{align*} \overline{\mathrm{bel}}(\mathbf{x}_t) &=p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) \\ &=\int p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})p(\mathbf{x}_{t-1}\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})\mathrm{d}\mathbf{x}_{t-1} && \to \text{marginalize}\,\mathbf{x}_{t-1}\\ &=\int p(\mathbf{x}_t\mid\mathbf{x}_{t-1}, \mathbf{u}_t)p(\mathbf{x}_{t-1}\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t-1})\mathrm{d}\mathbf{x}_{t-1} &&\to\text{Markov assumption} \end{align*}

Then, recognizing p(xt1z1:t1,u1:t1)=bel(xt1)p(\mathbf{x}_{t-1}\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t-1})=\mathrm{bel}(\mathbf{x}_{t-1}), we have

bel(xt)=p(xtut,xt1)bel(xt1)dxt1\begin{equation} \overline{\mathrm{bel}}(\mathbf{x}_t)=\int p(\mathbf{x}_t\mid\mathbf{u}_t,\mathbf{x}_{t-1})\mathrm{bel}(\mathbf{x}_{t-1})\mathrm{d}\mathbf{x}_{t-1} \end{equation}

In summary, the Bayes filter provides a two-step framework for updating the belief of the current state xt\mathbf{x}_t from the belief of the previous state xt1\mathbf{x}_{t-1}:

  • Prediction using the motion model (Equation 1);
  • Correction using the measurement model (Equation 2).

These two steps can be performed recursively to estimate the state over time.

Kalman Filter

The Kalman filter is an instantiation of the Bayes filter with the assumption of linear Gaussian models. This means the motion model and the measurement model should be

  • linear with respect to their arguments:

    Specifically, suppose xRm\mathbf{x}\in\mathbb{R}^m , uRn\mathbf{u}\in\mathbb{R}^n, and zRp\mathbf{z}\in\mathbb{R}^p. Let FRm×m\mathbf{F}\in \mathbb{R}^{m\times m}, GRm×n\mathbf{G}\in\mathbb{R}^{m\times n}, and HRp×m\mathbf{H}\in\mathbb{R}^{p\times m} be matrices (linear maps).

    • Motion model:
    xt=Ftxt1+Gtut+wt,wtN(0,Qt)\begin{align} \mathbf{x}_t=\mathbf{F}_t\mathbf{x}_{t-1}+\mathbf{G}_t\mathbf{u}_t+\mathbf{w}_t, &&\mathbf{w}_t\sim\mathcal{N}(\mathbf{0},\mathbf{Q}_t) \end{align}
    • Measurement model:
    zt=Htxt+vt,vtN(0,Rt)\begin{align} \mathbf{z}_t=\mathbf{H}_t\mathbf{x}_t+\mathbf{v}_t, &&\mathbf{v}_t \sim\mathcal{N}(\mathbf{0}, \mathbf{R}_t) \end{align}

    where wRm\mathbf{w}\in\mathbb{R}^m and vtRp\mathbf{v}_t\in\mathbb{R}^p are white noise with variance QRm×m\mathbf{Q}\in\mathbb{R}^{m\times m} and RRp×p\mathbf{R}\in\mathbb{R}^{p\times p} , respectively.

    • White noise

      A discrete-time random process (a sequence of random vectors), wk\mathbf{w}_k, is called white noise if

      E[wkwj]=Qkδkj,δkj={1k=j0kj\begin{align*} \mathbb{E}[\mathbf{w}_k\mathbf{w}_j^{\top}]=\mathbf{Q}_k\delta_{kj}, &&\delta_{kj}=\left\{\begin{matrix}1 & k=j\\ 0 & k\ne j\end{matrix}\right. \end{align*}
  • following a multivariate Gaussian distribution:

    Specifically,

    • Motion model: p(xtxt1,ut)=N(xt;Ftxt1+Gtut,Qt)p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{u}_t)=\mathcal{N}(\mathbf{x}_t;\mathbf{F}_t\mathbf{x}_{t-1}+\mathbf{G}_t\mathbf{u}_t, \mathbf{Q}_t);
    • Measurement model: p(ztxt)=N(zt;Htxt,Rt)p(\mathbf{z}_t\mid\mathbf{x}_t)=\mathcal{N}(\mathbf{z}_t;\mathbf{H}_t\mathbf{x}_t,\mathbf{R}_t).

    Note: here we need to interpret xt1\mathbf{x}_{t-1}as a realization rather than a random variable.

    • Multivariate Gaussian PDF

      N(x;μ,Σ)\mathcal{N}(\mathbf{x};\boldsymbol{\mu},\mathbf{\Sigma}) is the probability distribution function of a random variable xRm\mathbf{x}\in\mathbb{R}^m that follows a Gaussian distribution with mean μRm\boldsymbol{\mu}\in\mathbb{R}^m and covariance matrix ΣRm×m\mathbf{\Sigma}\in\mathbb{R}^{m\times m},

      N(x;μ,Σ)=p(x)=1(2π)mdet(Σ)exp(12(xμ)Σ1(xμ))\mathcal{N}(\mathbf{x};\boldsymbol{\mu},\mathbf{\Sigma})=p(\mathbf{x})=\frac{1}{\sqrt{(2\pi)^m\det(\mathbf{\Sigma})}}\exp\left(-\frac{1}{2}\left(\mathbf{x}-\boldsymbol{\mu}\right)^{\top}\mathbf{\Sigma}^{-1}\left(\mathbf{x}-\boldsymbol{\mu}\right)\right)

Before deriving the algorithm for the Kalman filter, there are two important properties of the multivariate Gaussian distribution that are useful here.

  1. Affine transformation:

    Let x\mathbf{x} be a random vector following a Gaussian distribution xN(μ,Σ)\mathbf{x}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma}), then the random vector y=Ax+b\mathbf{y}=\mathbf{A}\mathbf{x}+\mathbf{b} follows a Gaussian distribution yN(Aμ+b,AΣA)\mathbf{y}\sim\mathcal{N}(\mathbf{A}\boldsymbol{\mu}+\mathbf{b},\mathbf{A}\mathbf{\Sigma}\mathbf{A}^{\top}).

    • Proof

      Will come later

  1. Conditional distribution:

    Let yN(μ,Σ)\mathbf{y}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma}) be a random vector of dimension m+pm+p. Partition y\mathbf{y} into a vector of dimension mm, xRm\mathbf{x}\in\mathbb{R}^{m} and a vector of dimension pp, zRp\mathbf{z}\in\mathbb{R}^p, and partition μ\boldsymbol{\mu} and Σ\mathbf{\Sigma} respectively,

    [xz]N([μxμz],[ΣxxΣxzΣzxΣzz])\begin{equation*} \left[\begin{matrix} \mathbf{x} \\ \mathbf{z} \end{matrix}\right]\sim \mathcal{N}\left( \left[\begin{matrix} \boldsymbol{\mu}_x\\ \boldsymbol{\mu}_z \end{matrix}\right], \left[ \begin{matrix} \mathbf{\Sigma}_{xx} &\mathbf{\Sigma}_{xz}\\ \mathbf{\Sigma}_{zx} &\mathbf{\Sigma}_{zz} \end{matrix} \right] \right) \end{equation*}

    then, the conditional distribution p(xz=z)=N(x;μxz,Σxz)p(\mathbf{x}\mid\mathbf{z}=z)=\mathcal{N}(\mathbf{x};\boldsymbol{\mu}_{\mathbf{x}\mid\mathbf{z}},\mathbf{\Sigma}_{\mathbf{x}\mid\mathbf{z}}) , where

    μxz=μx+ΣxzΣzz1(zμz),Σxz=ΣxxΣxzΣzz1Σzx\begin{align*} \boldsymbol{\mu}_{\mathbf{x}\mid\mathbf{z}}=\boldsymbol{\mu}_x+\mathbf{\Sigma}_{xz}\boldsymbol{\Sigma}_{zz}^{-1}\left(z-\mathbf{\mu}_z\right), &&\mathbf{\Sigma}_{\mathbf{x}\mid\mathbf{z}}=\mathbf{\Sigma}_{xx}-\mathbf{\Sigma}_{xz}\mathbf{\Sigma}_{zz}^{-1}\mathbf{\Sigma}_{zx} \end{align*}
    • Proof

      Will come later

Now, if we assume the initial belief to be Gaussian, i.e., bel(x0)=p(x0)=N(x0;μ0,Σ0)\mathrm{bel}(\mathbf{x}_0)=p(\mathbf{x}_0)=\mathcal{N}(\mathbf{x}_0;\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0). Then, from equations (1) and (2), we can see that all subsequent beliefs will be Gaussian. Thus,

bel(xt)=p(xtz1:t1,u1:t)=N(xt;μt,Σt)bel(xt)=p(xtz1:t,u1,t)=N(xt;μt,Σt)\begin{align*} \overline{\mathrm{bel}}(\mathbf{x}_t) &=p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})=\mathcal{N}(\mathbf{x}_t;\boldsymbol{\mu}_t^-,\mathbf{\Sigma}_t^-)\\ \mathrm{bel}(\mathbf{x}_t)&= p(\mathbf{x}_t\mid\mathbf{z}_{1:t},\mathbf{u}_{1,t})=\mathcal{N}(\mathbf{x}_t;\boldsymbol{\mu}_t,\mathbf{\Sigma}_t) \end{align*}

Then, we only need to track μt,Σt\boldsymbol{\mu}_t^-,\mathbf{\Sigma}_t^- in the prediction step and μt,Σt\boldsymbol{\mu}_t,\mathbf{\Sigma}_t in the correction step.

For the prediction step, use the joint distribution p(xt,xt1u1:t,z1:t1)p(\mathbf{x}_t,\mathbf{x}_{t-1}\mid\mathbf{u}_{1:t},\mathbf{z}_{1:t-1}) and realize

[xtxt1]N([Ftμt1+Gtutμt1],[FtΣt1F+QtFtΣt1Σt1FtΣt1])\begin{equation*} \left[\begin{matrix} \mathbf{x}_t \\ \mathbf{x}_{t-1} \end{matrix}\right]\sim \mathcal{N}\left( \left[\begin{matrix} \mathbf{F}_t\boldsymbol{\mu}_{t-1}+\mathbf{G}_t\mathbf{u}_t\\ \boldsymbol{\mu}_{t-1} \end{matrix}\right], \left[ \begin{matrix} \mathbf{F}_{t}\mathbf{\Sigma}_{t-1}\mathbf{F}^{\top}+\mathbf{Q}_t &\mathbf{F}_t\mathbf{\Sigma}_{t-1}\\ \mathbf{\Sigma}_{t-1}\mathbf{F}_t^{\top} &\mathbf{\Sigma}_{t-1} \end{matrix} \right] \right) \end{equation*}
  • Derivation [2]

    From equation (3),

    [xtxt1]yt=[FtI]Atxt1+[Gt0]Btut+[I0]Wtwt\underbrace{\begin{bmatrix}\mathbf{x}_t \\\mathbf{x}_{t-1}\end{bmatrix}}_{\mathbf{y}_t}=\underbrace{\begin{bmatrix}\mathbf{F}_t \\\mathbf{I}\end{bmatrix}}_{\mathbf{A}_t}\mathbf{x}_{t-1}+\underbrace{\begin{bmatrix}\mathbf{G}_t \\\mathbf{0}\end{bmatrix}}_{\mathbf{B}_t}\mathbf{u}_t+\underbrace{\begin{bmatrix}\mathbf{I} \\\mathbf{0}\end{bmatrix}}_{\mathbf{W}_t}\mathbf{w}_t

    Compute the mean and covariance,

    E[yt]=E ⁣[Atxt1+Btut+Wtwt]=Atμt1+Btut=[Ftμt1+Gtutμt1]\mathbb{E}[\mathbf{y}_t] = \mathbb{E}\!\left[\mathbf{A}_t \mathbf{x}_{t-1} + \mathbf{B}_t \mathbf{u}_t + \mathbf{W}_t \mathbf{w}_t\right] = \mathbf{A}_t \boldsymbol{\mu}_{t-1} + \mathbf{B}_t \mathbf{u}_t = \begin{bmatrix} \mathbf{F}_t \boldsymbol{\mu}_{t-1} + \mathbf{G}_t \mathbf{u}_t \\ \boldsymbol{\mu}_{t-1} \end{bmatrix}
    Cov[yt]=E ⁣[(ytE[yt])(ytE[yt])]=E ⁣[(At(xt1μt1)+Wtwt)(At(xt1μt1)+Wtwt)]=AtΣt1At+WtQtWt=[FtΣt1Ft+QtFtΣt1Σt1FtΣt1]\begin{align*} \mathrm{Cov}[\mathbf{y}_t] &= \mathbb{E}\!\left[(\mathbf{y}_t - \mathbb{E}[\mathbf{y}_t])(\mathbf{y}_t - \mathbb{E}[\mathbf{y}_t])^\top\right] \\ &= \mathbb{E}\!\left[\left(\mathbf{A}_t(\mathbf{x}_{t-1} - \boldsymbol{\mu}_{t-1}) + \mathbf{W}_t \mathbf{w}_t\right) \left(\mathbf{A}_t(\mathbf{x}_{t-1} - \boldsymbol{\mu}_{t-1}) + \mathbf{W}_t \mathbf{w}_t\right)^\top\right] \\ &= \mathbf{A}_t \mathbf{\Sigma}_{t-1} \mathbf{A}_t^\top + \mathbf{W}_t \mathbf{Q}_t \mathbf{W}_t^\top \\ &= \begin{bmatrix} \mathbf{F}_t \mathbf{\Sigma}_{t-1} \mathbf{F}_t^\top + \mathbf{Q}_t & \mathbf{F}_t \mathbf{\Sigma}_{t-1} \\ \mathbf{\Sigma}_{t-1} \mathbf{F}_t^\top & \mathbf{\Sigma}_{t-1} \end{bmatrix} \end{align*}

Then bel(xt)=p(xtz1:t1,u1:t)\overline{\mathrm{bel}}(\mathbf{x}_t) =p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) can be obtained by marginalize xt1\mathbf{x}_{t-1},

μt=Ftμt1+Gtut(predicted mean)Σt=FtΣt1F+Qt(predicted covariance)\begin{align} \boldsymbol{\mu}_t^-&=\mathbf{F}_t\boldsymbol{\mu}_{t-1}+\mathbf{G_t}\mathbf{u}_t && \text{(predicted mean)} \\ \mathbf{\Sigma}_{t}^-&=\mathbf{F}_{t}\mathbf{\Sigma}_{t-1}\mathbf{F}^{\top}+\mathbf{Q}_t && \text{(predicted covariance)} \end{align}

For the correction step, form a joint distribution p(xt,ztu1:t,z1:t1)p(\mathbf{x}_t,\mathbf{z}_t\mid \mathbf{u}_{1:t},\mathbf{z}_{1:t-1}) and realize

[xtzt]N([μtHtμt],[ΣtΣtHtHtΣtHtΣtHt+Rt])\begin{equation*} \left[\begin{matrix} \mathbf{x}_t \\ \mathbf{z}_t \end{matrix}\right]\sim \mathcal{N}\left( \left[\begin{matrix} \boldsymbol{\mu}_t^-\\ \mathbf{H}_t\boldsymbol{\mu}_t^- \end{matrix}\right], \left[ \begin{matrix} \mathbf{\Sigma}_t^- &\mathbf{\Sigma}_t^-\mathbf{H}_t^{\top}\\ \mathbf{H}_t\mathbf{\Sigma}_{t}^- &\mathbf{H}_t\mathbf{\Sigma}_{t}^-\mathbf{H}_t^{\top}+\mathbf{R}_t \end{matrix} \right] \right) \end{equation*}
  • Derivation [2]

    Since xtN(μt,Σt)\mathbf{x}_t\sim\mathcal{N}(\boldsymbol{\mu}_t^-,\mathbf{\Sigma}_t^-) and zt=Htxt+vt\mathbf{z}_t=\mathbf{H_t}\mathbf{x}_t+\mathbf{v}_t, where vtN(0,Rt)\mathbf{v}_t\sim\mathcal{N}(\mathbf{0},\mathbf{R}_t). From property 1, we have

    ztN(Htμt,HtΣtHt+Rt)\mathbf{z}_t\sim\mathcal{N}(\mathbf{H}_t\boldsymbol{\mu}_t^-,\mathbf{H}_t\mathbf{\Sigma}_t^-\mathbf{H}_t^{\top}+\mathbf{R}_t)

    Furthermore,

    Σxz=Cov(xt,zt)=Cov(xt,Htxt+vt)=Cov(xt,Htxt)=E[(xtμt)(HtxtHtμt)]=E[(xtμt)(xtμt)Ht]=E[(xtμt)(xtμt)]Ht=ΣtHtΣzx=Cov(zt,xt)==HtΣt\begin{align*} \mathbf{\Sigma}_{xz}&=\mathrm{Cov}(\mathbf{x}_t,\mathbf{z}_t)=\mathrm{Cov}(\mathbf{x}_t,\mathbf{H_t}\mathbf{x}_t+\mathbf{v}_t)=\mathrm{Cov}(\mathbf{x}_t,\mathbf{H}_t\mathbf{x}_t)\\ &=\mathbb{E}\left[\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)\left(\mathbf{H}_t\mathbf{x}_t-\mathbf{H_t}\boldsymbol{\mu}_t^-\right)^{\top}\right]\\ &=\mathbb{E}\left[\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)^{\top}\mathbf{H_t}^{\top}\right]\\ &=\mathbb{E}\left[\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)^{\top}\right]\mathbf{H_t}^{\top}\\ &=\mathbf{\Sigma}_t^-\mathbf{H_t}^{\top} \\ \mathbf{\Sigma}_{zx}&=\mathrm{Cov}(\mathbf{z}_t,\mathbf{x}_t)=\dots=\mathbf{H}_t\mathbf{\Sigma}_t^- \end{align*}

Thus, bel(xt)=p(xtzt,u1:t,z1:t1)\mathrm{bel}(\mathbf{x}_t)=p(\mathbf{x}_t\mid\mathbf{z}_t,\mathbf{u}_{1:t},\mathbf{z}_{1:t-1}), where

νt=ztHtμt(innovation)St=HtΣtHt+Rt(innovation covariance)Kt=ΣtHtSt1(Kalman gain)\begin{align*} \boldsymbol{\nu}_t&=\mathbf{z}_t-\mathbf{H}_t\boldsymbol{\mu}_t^- && \text{(innovation)} \\ \mathbf{S}_t&=\mathbf{H}_t\mathbf{\Sigma}_{t}^-\mathbf{H}_t^{\top}+\mathbf{R}_t && \text{(innovation covariance)}\\ \mathbf{K}_t&=\mathbf{\Sigma}_t^-\mathbf{H}_t^{\top}\mathbf{S}_t^{-1} && \text{(Kalman gain)} \end{align*}
μt=μt+Ktνk(corrected mean)Σt=(IKtHt)Σt(corrected covariance)\begin{align} \boldsymbol{\mu}_t&=\boldsymbol{\mu}_t^-+\mathbf{K}_t\boldsymbol{\nu}_k && \text{(corrected mean)}\\ \mathbf{\Sigma}_t&=\left(\mathbf{I}-\mathbf{K}_t\mathbf{H_t}\right)\mathbf{\Sigma}_t^- && \text{(corrected covariance)} \end{align}

Finally, normalize for numerical stability

Σt=(IKtHt)Σt(IKtHt)+KtRtKt\begin{equation} \mathbf{\Sigma}_t=\left(\mathbf{I}-\mathbf{K}_t\mathbf{H_t}\right)\mathbf{\Sigma}_t^-\left(\mathbf{I}-\mathbf{K}_t\mathbf{H_t}\right)^{\top}+\mathbf{K}_t\mathbf{R}_t\mathbf{K}_t^{\top} \end{equation}

Extended Kalman Filter (EKF)

One way to handle the case where motion model and measurement model are not linear is to approximate the functions via local linearization (Taylor expansion). Specifically, we extend the Kalman filter algorithm to cases where models are non-linear:

  • Motion model
xt=f(ut,xt1)+wt,wtN(0,Qt)\begin{align*} \mathbf{x}_t=f(\mathbf{u}_t,\mathbf{x}_{t-1})+\mathbf{w}_t, &&\mathbf{w}_t\sim\mathcal{N}(\mathbf{0},\mathbf{Q}_t) \end{align*}
  • Measurement model
zt=h(xt)+vt,vtN(0,Rt)\begin{align*} \mathbf{z}_t=h(\mathbf{x}_t)+\mathbf{v}_t, &&\mathbf{v}_t \sim\mathcal{N}(\mathbf{0}, \mathbf{R}_t) \end{align*}

Let

Ft=fxx=μt1,Wt=fwx=μt1,Ht=hxx=μt,Vt=hvx=μt\mathbf{F}_t=\left.\frac{\partial f}{\partial \mathbf{x}}\right|_{\mathbf{x}=\boldsymbol{\mu}_{t-1}}, \qquad \mathbf{W}_t=\left.\frac{\partial f}{\partial \mathbf{w}}\right|_{\mathbf{x}=\boldsymbol{\mu}_{t-1}}, \qquad \mathbf{H}_t=\left.\frac{\partial h}{\partial \mathbf{x}}\right|_{\mathbf{x}=\boldsymbol{\mu}_t^-}, \qquad \mathbf{V}_t=\left.\frac{\partial h}{\partial \mathbf{v}}\right|_{\mathbf{x}=\boldsymbol{\mu}_t^-}

The EKF algorithm is

Input: μt1,  Σt1,  ut,  zt1:μtf(ut,μt1)(predicted mean)2:ΣtFtΣt1Ft+WtQtWt(predicted covariance)3:νtzth(μt)(innovation)4:StHtΣtHt+VtRtVt(innovation covariance)5:KtΣtHtSt1(filter gain)6:μtμt+Ktνt(corrected mean)7:Σt(IKtHt)Σt(corrected covariance)8:Σt(IKtHt)Σt(IKtHt)+KtRtKt(numerically stable form)return    μt,  Σt\begin{align} &\textbf{Input: } \boldsymbol{\mu}_{t-1},\; \boldsymbol{\Sigma}_{t-1},\; \mathbf{u}_t,\; \mathbf{z}_t \nonumber\\[6pt] &\quad\quad \text{1:}\quad \boldsymbol{\mu}_t^- \leftarrow f(\mathbf{u}_t,\boldsymbol{\mu}_{t-1}) &&\text{(predicted mean)} \nonumber\\ &\quad\quad\text{2:}\quad \boldsymbol{\Sigma}_t^- \leftarrow \mathbf{F}_t\boldsymbol{\Sigma}_{t-1}\mathbf{F}_t^\top + \mathbf{W}_t\mathbf{Q}_t\mathbf{W}_t^\top &&\text{(predicted covariance)} \nonumber\\ &\quad\quad\text{3:}\quad \boldsymbol{\nu}_t \leftarrow \mathbf{z}_t - h(\boldsymbol{\mu}_t^-) &&\text{(innovation)} \nonumber\\ &\quad\quad\text{4:}\quad \mathbf{S}_t \leftarrow \mathbf{H}_t\boldsymbol{\Sigma}_t^-\mathbf{H}_t^\top + \mathbf{V}_t\mathbf{R}_t\mathbf{V}_t^\top &&\text{(innovation covariance)} \nonumber\\ &\quad\quad\text{5:}\quad \mathbf{K}_t \leftarrow \boldsymbol{\Sigma}_t^-\mathbf{H}_t^\top\mathbf{S}_t^{-1} &&\text{(filter gain)} \nonumber\\ &\quad\quad\text{6:}\quad \boldsymbol{\mu}_t \leftarrow \boldsymbol{\mu}_t^- + \mathbf{K}_t\boldsymbol{\nu}_t &&\text{(corrected mean)} \nonumber\\ &\quad\quad\text{7:}\quad \boldsymbol{\Sigma}_t \leftarrow (\mathbf{I} - \mathbf{K}_t\mathbf{H}_t)\boldsymbol{\Sigma}_t^- &&\text{(corrected covariance)} \nonumber\\ &\quad\quad\text{8:}\quad \boldsymbol{\Sigma}_t \leftarrow (\mathbf{I} - \mathbf{K}_t\mathbf{H}_t)\boldsymbol{\Sigma}_t^-(\mathbf{I} - \mathbf{K}_t\mathbf{H}_t)^\top + \mathbf{K}_t\mathbf{R}_t\mathbf{K}_t^\top &&\text{(numerically stable form)} \nonumber\\[6pt] &\textbf{return}\;\; \boldsymbol{\mu}_t,\; \boldsymbol{\Sigma}_t \nonumber \end{align}

Unscented Kalman Filter (UKF)

Another non-linear Kalman filter is the unscented Kalman filter which uses deterministic sampling to approximate the distribution.

Let a function f:RnRmf:\mathbb{R}^{n}\to\mathbb{R}^m be a non-linear transformation. Suppose xRn\mathbf{x}\in\mathbb{R}^n and xN(μ,Σ)\mathbf{x}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma}). We draw a set of nn sigmapoints, X={xi}i=1n\mathcal{X}=\{\mathbf{x}_i\}_{i=1}^{n}, and compute a weight wiw_i for each sigma point by executing

SigmaPoints(μ,Σ):x0=μxi=μ+i,i=1,,nxi=μin,i=n+1,,2nw0=κn+κwi=12(n+κ),i=1,,2nreturn(X,w)\begin{align*} &\texttt{SigmaPoints}(\boldsymbol{\mu},\mathbf{\Sigma}):\\ &\quad\quad \mathbf{x}_0 = \boldsymbol{\mu} \nonumber\\ &\quad\quad \mathbf{x}_i = \boldsymbol{\mu} + \boldsymbol{\ell}'_i, \qquad i = 1,\ldots,n \nonumber\\ &\quad\quad \mathbf{x}_i = \boldsymbol{\mu} - \boldsymbol{\ell}'_{i-n}, \qquad i = n+1,\ldots,2n \nonumber\\[4pt] &\quad\quad w_0 = \dfrac{\kappa}{n+\kappa} \nonumber\\[6pt] &\quad\quad w_i = \dfrac{1}{2(n+\kappa)}, \qquad i = 1,\ldots,2n\\ &\textbf{return}\,\, (\mathcal{X},\mathbf{w}) \end{align*}

Further let y=f(x)\mathbf{y}=f(\mathbf{x}), then yN(μ,Σ)\mathbf{y}\sim\mathcal{N}(\boldsymbol{\mu}',\mathbf{\Sigma}'), where

μ=i=02nwif(xi)Σ=i=02nwi(f(xi)μ)(f(xi)μ)\begin{align*} \boldsymbol{\mu}'&=\sum_{i=0}^{2n}w_if(\mathbf{x}_i)\\ \mathbf{\Sigma}'&=\sum_{i=0}^{2n}w_i(f(\mathbf{x}_i)-\boldsymbol{\mu}')(f(\mathbf{x}_i)-\boldsymbol{\mu}')^{\top} \end{align*}

The UKF algorithm is

Input: μt1,  Σt1,  ut,  zt1:(Xt1,w)SigmaPoints(μt1,Σt1)2:μt=i=02nwif(ut,xt1,i)(predicted mean)3:Σti=02nwi(f(ut,xt1,i)μt)(f(ut,xt1,i)μt)+Qt(predicted covariance)4:(Xt,w)SigmaPoints(μt,Σt)5:zt=i=02nwih(xt,i)(predicted measurement)6:νtztzt(innovation)7:Sti=02nwi(h(xt,i)zt)(h(xt,i)zt)+Rt(innovation covariance)8:Σtxzi=02nwi(xt,iμt)(h(xt,i)zt)(cross covariance)9:KtΣtxzSt1(filter gain)10:μtμt+Ktνt(corrected mean)11:ΣtΣtKtStKt(corrected covariance)return    μt,  Σt\begin{align} &\textbf{Input: } \boldsymbol{\mu}_{t-1},\; \boldsymbol{\Sigma}_{t-1},\; \mathbf{u}_t,\; \mathbf{z}_t \nonumber\\[6pt] &\quad\quad\text{1:}\quad (\mathcal{X}_{t-1},\,\mathbf{w}^-) \leftarrow \texttt{SigmaPoints}(\boldsymbol{\mu}_{t-1},\boldsymbol{\Sigma}_{t-1}) \nonumber\\ &\quad\quad\text{2:}\quad \boldsymbol{\mu}_t^- = \sum_{i=0}^{2n} w_i^-\, f(\mathbf{u}_t, \mathbf{x}_{t-1,i}) &&\text{(predicted mean)} \nonumber\\ &\quad\quad\text{3:}\quad \boldsymbol{\Sigma}_t^- \leftarrow \sum_{i=0}^{2n} w_i^- \bigl(f(\mathbf{u}_t,\mathbf{x}_{t-1,i}) - \boldsymbol{\mu}_t^-\bigr) \bigl(f(\mathbf{u}_t,\mathbf{x}_{t-1,i}) - \boldsymbol{\mu}_t^-\bigr)^\top + \mathbf{Q}_t &&\text{(predicted covariance)} \nonumber\\ &\quad\quad\text{4:}\quad (\mathcal{X}_t^-,\,\mathbf{w}) \leftarrow \texttt{SigmaPoints}(\boldsymbol{\mu}_t^-, \boldsymbol{\Sigma}_t^-) \nonumber\\ &\quad\quad\text{5:}\quad \mathbf{z}_t^- = \sum_{i=0}^{2n} w_i\, h(\mathbf{x}_{t,i}^-) &&\text{(predicted measurement)} \nonumber\\ &\quad\quad\text{6:}\quad \boldsymbol{\nu}_t \leftarrow \mathbf{z}_t - \mathbf{z}_t^- &&\text{(innovation)} \nonumber\\ &\quad\quad\text{7:}\quad \mathbf{S}_t \leftarrow \sum_{i=0}^{2n} w_i \bigl(h(\mathbf{x}_{t,i}^-) - \mathbf{z}_t^-\bigr) \bigl(h(\mathbf{x}_{t,i}^-) - \mathbf{z}_t^-\bigr)^\top + \mathbf{R}_t &&\text{(innovation covariance)} \nonumber\\ &\quad\quad\text{8:}\quad \boldsymbol{\Sigma}_t^{xz} \leftarrow \sum_{i=0}^{2n} w_i \bigl(\mathbf{x}_{t,i}^- - \boldsymbol{\mu}_t^-\bigr) \bigl(h(\mathbf{x}_{t,i}^-) - \mathbf{z}_t^-\bigr)^\top &&\text{(cross covariance)} \nonumber\\ &\quad\quad\text{9:}\quad \mathbf{K}_t \leftarrow \boldsymbol{\Sigma}_t^{xz}\,\mathbf{S}_t^{-1} &&\text{(filter gain)} \nonumber\\ &\quad\quad\text{10:}\quad \boldsymbol{\mu}_t \leftarrow \boldsymbol{\mu}_t^- + \mathbf{K}_t\boldsymbol{\nu}_t &&\text{(corrected mean)} \nonumber\\ &\quad\quad\text{11:}\quad \boldsymbol{\Sigma}_t \leftarrow \boldsymbol{\Sigma}_t^- - \mathbf{K}_t\mathbf{S}_t\mathbf{K}_t^\top &&\text{(corrected covariance)} \nonumber\\[6pt] &\textbf{return}\;\; \boldsymbol{\mu}_t,\; \boldsymbol{\Sigma}_t \nonumber \end{align}

Reference

[1] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. Mit Press, 2010.
[2] M. Ghaffari, ROB530 Lecture Slides — Kalman Filtering. University of Michigan, 2026.

Categories: Robotics