Bayes Filtering and State Estimation

7 minute read ·

Published: March 04, 2026

In robot state estimation, the Bayes filter is a probabilistic approach that estimates the state from a sequence of controls and measurements by recursively performing prediction with a motion model and correction with a measurement model. In practice, some common instantiations of the Bayes framework are the Kalman filter, information filter, and particle filter.

Mathematical Formulation of Bayes Filter

Given a sequence of measurements $\mathrm{z}_{1:t}=\mathbf{z}_1,\dots,\mathbf{z}_t$ and controls $\mathbf{u}_{1:t}=\mathbf{u}_1,\dots,\mathbf{u}_t$ , and a measurement model $p(\mathbf{z}_t\mid\mathbf{x}_t)$ and a motion model $p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{u}_t)$ . The Bayes filter provides a recursive framework for estimating the state $\mathbf{x}_t$ in terms of a probability distribution function, known as the belief $\mathrm{bel}(\mathbf{x}_t)\triangleq p(\mathbf{x}_t\mid\mathbf{z}_{1:t},\mathbf{u}_{1,t})$ , or posterior.

The dynamic Bayes network that characterizes the evolution of controls, states, and measurements. [1]

The filtering algorithm is built upon two assumptions of conditional independence:

Motion model is (1st-order) Markov:

p(\mathbf{x}_t\mid\mathbf{x}_{0:t-1},\mathbf{u}_{1:t})=p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{u}_t)

The measurement model is memoryless:

p(\mathbf{z}_t\mid\mathbf{x}_{0:t},\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})=p(\mathbf{z}_{t}\mid\mathbf{x}_t)

Given these two assumptions, we can propagate the belief over time:

\begin{align*} \mathrm{bel}(\mathbf{x}_t) &=p(\mathbf{x}_t\mid\mathbf{z}_{1:t},\mathbf{u}_{1:t})=p(\mathbf{x}_t\mid\mathbf{z}_t,\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) && \to\text{definition} \\ &=\eta p(\mathbf{z}_t\mid\mathbf{x}_t,\mathbf{z}_{1:t-1}, \mathbf{u}_{1:t})p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) && \to\text{Bayes rule}\\ &=\eta p(\mathbf{z}_t\mid\mathbf{x}_t)p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) && \to \text{memoryless} \end{align*}

We can define the predicted belief of the state at time $t$ to be the probability distribution before knowing the current measurement, i.e., $\overline{\mathrm{bel}}(\mathbf{x}_t)\triangleq p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})$ . Then, the above equation becomes

\begin{equation} \mathrm{bel}(\mathrm{x}_t)=\eta p(\mathbf{z}_t\mid\mathbf{x}_t)\overline{\mathrm{bel}}(\mathbf{x}_t) \end{equation}

where $\eta$ is the normalization coefficient, and

\frac{1}{\eta}=\int p(\mathbf{z}_t\mid\mathbf{x}_t)\overline{\mathrm{bel}}(\mathbf{x}_t)\mathrm{d}\mathbf{x}_t

which is essentially a correction applied to the belief after observing the current measurement.

What remains to be done, obviously, is to predict the belief $\overline{\mathrm{bel}}(\mathbf{x}_t)$ from the previous state:

\begin{align*} \overline{\mathrm{bel}}(\mathbf{x}_t) &=p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) \\ &=\int p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})p(\mathbf{x}_{t-1}\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})\mathrm{d}\mathbf{x}_{t-1} && \to \text{marginalize}\,\mathbf{x}_{t-1}\\ &=\int p(\mathbf{x}_t\mid\mathbf{x}_{t-1}, \mathbf{u}_t)p(\mathbf{x}_{t-1}\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t-1})\mathrm{d}\mathbf{x}_{t-1} &&\to\text{Markov assumption} \end{align*}

Then, recognizing $p(\mathbf{x}_{t-1}\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t-1})=\mathrm{bel}(\mathbf{x}_{t-1})$ , we have

\begin{equation} \overline{\mathrm{bel}}(\mathbf{x}_t)=\int p(\mathbf{x}_t\mid\mathbf{u}_t,\mathbf{x}_{t-1})\mathrm{bel}(\mathbf{x}_{t-1})\mathrm{d}\mathbf{x}_{t-1} \end{equation}

In summary, the Bayes filter provides a two-step framework for updating the belief of the current state $\mathbf{x}_t$ from the belief of the previous state $\mathbf{x}_{t-1}$ :

Prediction using the motion model (Equation 1);

Correction using the measurement model (Equation 2).

These two steps can be performed recursively to estimate the state over time.

Kalman Filter

The Kalman filter is an instantiation of the Bayes filter with the assumption of linear Gaussian models. This means the motion model and the measurement model should be

linear with respect to their arguments:
Specifically, suppose $\mathbf{x}\in\mathbb{R}^m$ , $\mathbf{u}\in\mathbb{R}^n$ , and $\mathbf{z}\in\mathbb{R}^p$ . Let $\mathbf{F}\in \mathbb{R}^{m\times m}$ , $\mathbf{G}\in\mathbb{R}^{m\times n}$ , and $\mathbf{H}\in\mathbb{R}^{p\times m}$ be matrices (linear maps).
- Motion model:
$\begin{align} \mathbf{x}_t=\mathbf{F}_t\mathbf{x}_{t-1}+\mathbf{G}_t\mathbf{u}_t+\mathbf{w}_t, &&\mathbf{w}_t\sim\mathcal{N}(\mathbf{0},\mathbf{Q}_t) \end{align}$
- Measurement model:
$\begin{align} \mathbf{z}_t=\mathbf{H}_t\mathbf{x}_t+\mathbf{v}_t, &&\mathbf{v}_t \sim\mathcal{N}(\mathbf{0}, \mathbf{R}_t) \end{align}$
where $\mathbf{w}\in\mathbb{R}^m$ and $\mathbf{v}_t\in\mathbb{R}^p$ are white noise with variance $\mathbf{Q}\in\mathbb{R}^{m\times m}$ and $\mathbf{R}\in\mathbb{R}^{p\times p}$ , respectively.
- White noise
  A discrete-time random process (a sequence of random vectors), $\mathbf{w}_k$ , is called white noise if
  $\begin{align*} \mathbb{E}[\mathbf{w}_k\mathbf{w}_j^{\top}]=\mathbf{Q}_k\delta_{kj}, &&\delta_{kj}=\left\{\begin{matrix}1 & k=j\\ 0 & k\ne j\end{matrix}\right. \end{align*}$

following a multivariate Gaussian distribution:
Specifically,
- Motion model: $p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{u}_t)=\mathcal{N}(\mathbf{x}_t;\mathbf{F}_t\mathbf{x}_{t-1}+\mathbf{G}_t\mathbf{u}_t, \mathbf{Q}_t)$ ;
- Measurement model: $p(\mathbf{z}_t\mid\mathbf{x}_t)=\mathcal{N}(\mathbf{z}_t;\mathbf{H}_t\mathbf{x}_t,\mathbf{R}_t)$ .
Note: here we need to interpret $\mathbf{x}_{t-1}$ as a realization rather than a random variable.
- Multivariate Gaussian PDF
  $\mathcal{N}(\mathbf{x};\boldsymbol{\mu},\mathbf{\Sigma})$ is the probability distribution function of a random variable $\mathbf{x}\in\mathbb{R}^m$ that follows a Gaussian distribution with mean $\boldsymbol{\mu}\in\mathbb{R}^m$ and covariance matrix $\mathbf{\Sigma}\in\mathbb{R}^{m\times m}$ ,
  $\mathcal{N}(\mathbf{x};\boldsymbol{\mu},\mathbf{\Sigma})=p(\mathbf{x})=\frac{1}{\sqrt{(2\pi)^m\det(\mathbf{\Sigma})}}\exp\left(-\frac{1}{2}\left(\mathbf{x}-\boldsymbol{\mu}\right)^{\top}\mathbf{\Sigma}^{-1}\left(\mathbf{x}-\boldsymbol{\mu}\right)\right)$

Before deriving the algorithm for the Kalman filter, there are two important properties of the multivariate Gaussian distribution that are useful here.

Affine transformation:
Let $\mathbf{x}$ be a random vector following a Gaussian distribution $\mathbf{x}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma})$ , then the random vector $\mathbf{y}=\mathbf{A}\mathbf{x}+\mathbf{b}$ follows a Gaussian distribution $\mathbf{y}\sim\mathcal{N}(\mathbf{A}\boldsymbol{\mu}+\mathbf{b},\mathbf{A}\mathbf{\Sigma}\mathbf{A}^{\top})$ .
- Proof
  Will come later

Conditional distribution:
Let $\mathbf{y}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma})$ be a random vector of dimension $m+p$ . Partition $\mathbf{y}$ into a vector of dimension $m$ , $\mathbf{x}\in\mathbb{R}^{m}$ and a vector of dimension $p$ , $\mathbf{z}\in\mathbb{R}^p$ , and partition $\boldsymbol{\mu}$ and $\mathbf{\Sigma}$ respectively,
$\begin{equation*} \left[\begin{matrix} \mathbf{x} \\ \mathbf{z} \end{matrix}\right]\sim \mathcal{N}\left( \left[\begin{matrix} \boldsymbol{\mu}_x\\ \boldsymbol{\mu}_z \end{matrix}\right], \left[ \begin{matrix} \mathbf{\Sigma}_{xx} &\mathbf{\Sigma}_{xz}\\ \mathbf{\Sigma}_{zx} &\mathbf{\Sigma}_{zz} \end{matrix} \right] \right) \end{equation*}$
then, the conditional distribution $p(\mathbf{x}\mid\mathbf{z}=z)=\mathcal{N}(\mathbf{x};\boldsymbol{\mu}_{\mathbf{x}\mid\mathbf{z}},\mathbf{\Sigma}_{\mathbf{x}\mid\mathbf{z}})$ , where
$\begin{align*} \boldsymbol{\mu}_{\mathbf{x}\mid\mathbf{z}}=\boldsymbol{\mu}_x+\mathbf{\Sigma}_{xz}\boldsymbol{\Sigma}_{zz}^{-1}\left(z-\mathbf{\mu}_z\right), &&\mathbf{\Sigma}_{\mathbf{x}\mid\mathbf{z}}=\mathbf{\Sigma}_{xx}-\mathbf{\Sigma}_{xz}\mathbf{\Sigma}_{zz}^{-1}\mathbf{\Sigma}_{zx} \end{align*}$
- Proof
  Will come later

Now, if we assume the initial belief to be Gaussian, i.e., $\mathrm{bel}(\mathbf{x}_0)=p(\mathbf{x}_0)=\mathcal{N}(\mathbf{x}_0;\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0)$ . Then, from equations (1) and (2), we can see that all subsequent beliefs will be Gaussian. Thus,

\begin{align*} \overline{\mathrm{bel}}(\mathbf{x}_t) &=p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})=\mathcal{N}(\mathbf{x}_t;\boldsymbol{\mu}_t^-,\mathbf{\Sigma}_t^-)\\ \mathrm{bel}(\mathbf{x}_t)&= p(\mathbf{x}_t\mid\mathbf{z}_{1:t},\mathbf{u}_{1,t})=\mathcal{N}(\mathbf{x}_t;\boldsymbol{\mu}_t,\mathbf{\Sigma}_t) \end{align*}

Then, we only need to track $\boldsymbol{\mu}_t^-,\mathbf{\Sigma}_t^-$ in the prediction step and $\boldsymbol{\mu}_t,\mathbf{\Sigma}_t$ in the correction step.

For the prediction step, use the joint distribution $p(\mathbf{x}_t,\mathbf{x}_{t-1}\mid\mathbf{u}_{1:t},\mathbf{z}_{1:t-1})$ and realize

\begin{equation*} \left[\begin{matrix} \mathbf{x}_t \\ \mathbf{x}_{t-1} \end{matrix}\right]\sim \mathcal{N}\left( \left[\begin{matrix} \mathbf{F}_t\boldsymbol{\mu}_{t-1}+\mathbf{G}_t\mathbf{u}_t\\ \boldsymbol{\mu}_{t-1} \end{matrix}\right], \left[ \begin{matrix} \mathbf{F}_{t}\mathbf{\Sigma}_{t-1}\mathbf{F}^{\top}+\mathbf{Q}_t &\mathbf{F}_t\mathbf{\Sigma}_{t-1}\\ \mathbf{\Sigma}_{t-1}\mathbf{F}_t^{\top} &\mathbf{\Sigma}_{t-1} \end{matrix} \right] \right) \end{equation*}

Derivation [2]
From equation (3),
$\underbrace{\begin{bmatrix}\mathbf{x}_t \\\mathbf{x}_{t-1}\end{bmatrix}}_{\mathbf{y}_t}=\underbrace{\begin{bmatrix}\mathbf{F}_t \\\mathbf{I}\end{bmatrix}}_{\mathbf{A}_t}\mathbf{x}_{t-1}+\underbrace{\begin{bmatrix}\mathbf{G}_t \\\mathbf{0}\end{bmatrix}}_{\mathbf{B}_t}\mathbf{u}_t+\underbrace{\begin{bmatrix}\mathbf{I} \\\mathbf{0}\end{bmatrix}}_{\mathbf{W}_t}\mathbf{w}_t$
Compute the mean and covariance,
$\mathbb{E}[\mathbf{y}_t] = \mathbb{E}\!\left[\mathbf{A}_t \mathbf{x}_{t-1} + \mathbf{B}_t \mathbf{u}_t + \mathbf{W}_t \mathbf{w}_t\right] = \mathbf{A}_t \boldsymbol{\mu}_{t-1} + \mathbf{B}_t \mathbf{u}_t = \begin{bmatrix} \mathbf{F}_t \boldsymbol{\mu}_{t-1} + \mathbf{G}_t \mathbf{u}_t \\ \boldsymbol{\mu}_{t-1} \end{bmatrix}$
$\begin{align*} \mathrm{Cov}[\mathbf{y}_t] &= \mathbb{E}\!\left[(\mathbf{y}_t - \mathbb{E}[\mathbf{y}_t])(\mathbf{y}_t - \mathbb{E}[\mathbf{y}_t])^\top\right] \\ &= \mathbb{E}\!\left[\left(\mathbf{A}_t(\mathbf{x}_{t-1} - \boldsymbol{\mu}_{t-1}) + \mathbf{W}_t \mathbf{w}_t\right) \left(\mathbf{A}_t(\mathbf{x}_{t-1} - \boldsymbol{\mu}_{t-1}) + \mathbf{W}_t \mathbf{w}_t\right)^\top\right] \\ &= \mathbf{A}_t \mathbf{\Sigma}_{t-1} \mathbf{A}_t^\top + \mathbf{W}_t \mathbf{Q}_t \mathbf{W}_t^\top \\ &= \begin{bmatrix} \mathbf{F}_t \mathbf{\Sigma}_{t-1} \mathbf{F}_t^\top + \mathbf{Q}_t & \mathbf{F}_t \mathbf{\Sigma}_{t-1} \\ \mathbf{\Sigma}_{t-1} \mathbf{F}_t^\top & \mathbf{\Sigma}_{t-1} \end{bmatrix} \end{align*}$

Then $\overline{\mathrm{bel}}(\mathbf{x}_t) =p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})$ can be obtained by marginalize $\mathbf{x}_{t-1}$ ,

\begin{align} \boldsymbol{\mu}_t^-&=\mathbf{F}_t\boldsymbol{\mu}_{t-1}+\mathbf{G_t}\mathbf{u}_t && \text{(predicted mean)} \\ \mathbf{\Sigma}_{t}^-&=\mathbf{F}_{t}\mathbf{\Sigma}_{t-1}\mathbf{F}^{\top}+\mathbf{Q}_t && \text{(predicted covariance)} \end{align}

For the correction step, form a joint distribution $p(\mathbf{x}_t,\mathbf{z}_t\mid \mathbf{u}_{1:t},\mathbf{z}_{1:t-1})$ and realize

\begin{equation*} \left[\begin{matrix} \mathbf{x}_t \\ \mathbf{z}_t \end{matrix}\right]\sim \mathcal{N}\left( \left[\begin{matrix} \boldsymbol{\mu}_t^-\\ \mathbf{H}_t\boldsymbol{\mu}_t^- \end{matrix}\right], \left[ \begin{matrix} \mathbf{\Sigma}_t^- &\mathbf{\Sigma}_t^-\mathbf{H}_t^{\top}\\ \mathbf{H}_t\mathbf{\Sigma}_{t}^- &\mathbf{H}_t\mathbf{\Sigma}_{t}^-\mathbf{H}_t^{\top}+\mathbf{R}_t \end{matrix} \right] \right) \end{equation*}

Derivation [2]
Since $\mathbf{x}_t\sim\mathcal{N}(\boldsymbol{\mu}_t^-,\mathbf{\Sigma}_t^-)$ and $\mathbf{z}_t=\mathbf{H_t}\mathbf{x}_t+\mathbf{v}_t$ , where $\mathbf{v}_t\sim\mathcal{N}(\mathbf{0},\mathbf{R}_t)$ . From property 1, we have
$\mathbf{z}_t\sim\mathcal{N}(\mathbf{H}_t\boldsymbol{\mu}_t^-,\mathbf{H}_t\mathbf{\Sigma}_t^-\mathbf{H}_t^{\top}+\mathbf{R}_t)$
Furthermore,
$\begin{align*} \mathbf{\Sigma}_{xz}&=\mathrm{Cov}(\mathbf{x}_t,\mathbf{z}_t)=\mathrm{Cov}(\mathbf{x}_t,\mathbf{H_t}\mathbf{x}_t+\mathbf{v}_t)=\mathrm{Cov}(\mathbf{x}_t,\mathbf{H}_t\mathbf{x}_t)\\ &=\mathbb{E}\left[\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)\left(\mathbf{H}_t\mathbf{x}_t-\mathbf{H_t}\boldsymbol{\mu}_t^-\right)^{\top}\right]\\ &=\mathbb{E}\left[\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)^{\top}\mathbf{H_t}^{\top}\right]\\ &=\mathbb{E}\left[\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)^{\top}\right]\mathbf{H_t}^{\top}\\ &=\mathbf{\Sigma}_t^-\mathbf{H_t}^{\top} \\ \mathbf{\Sigma}_{zx}&=\mathrm{Cov}(\mathbf{z}_t,\mathbf{x}_t)=\dots=\mathbf{H}_t\mathbf{\Sigma}_t^- \end{align*}$

Thus, $\mathrm{bel}(\mathbf{x}_t)=p(\mathbf{x}_t\mid\mathbf{z}_t,\mathbf{u}_{1:t},\mathbf{z}_{1:t-1})$ , where

\begin{align*} \boldsymbol{\nu}_t&=\mathbf{z}_t-\mathbf{H}_t\boldsymbol{\mu}_t^- && \text{(innovation)} \\ \mathbf{S}_t&=\mathbf{H}_t\mathbf{\Sigma}_{t}^-\mathbf{H}_t^{\top}+\mathbf{R}_t && \text{(innovation covariance)}\\ \mathbf{K}_t&=\mathbf{\Sigma}_t^-\mathbf{H}_t^{\top}\mathbf{S}_t^{-1} && \text{(Kalman gain)} \end{align*}

\begin{align} \boldsymbol{\mu}_t&=\boldsymbol{\mu}_t^-+\mathbf{K}_t\boldsymbol{\nu}_k && \text{(corrected mean)}\\ \mathbf{\Sigma}_t&=\left(\mathbf{I}-\mathbf{K}_t\mathbf{H_t}\right)\mathbf{\Sigma}_t^- && \text{(corrected covariance)} \end{align}

Finally, normalize for numerical stability

\begin{equation} \mathbf{\Sigma}_t=\left(\mathbf{I}-\mathbf{K}_t\mathbf{H_t}\right)\mathbf{\Sigma}_t^-\left(\mathbf{I}-\mathbf{K}_t\mathbf{H_t}\right)^{\top}+\mathbf{K}_t\mathbf{R}_t\mathbf{K}_t^{\top} \end{equation}

Extended Kalman Filter (EKF)

One way to handle the case where motion model and measurement model are not linear is to approximate the functions via local linearization (Taylor expansion). Specifically, we extend the Kalman filter algorithm to cases where models are non-linear:

Motion model

\begin{align*} \mathbf{x}_t=f(\mathbf{u}_t,\mathbf{x}_{t-1})+\mathbf{w}_t, &&\mathbf{w}_t\sim\mathcal{N}(\mathbf{0},\mathbf{Q}_t) \end{align*}

Measurement model

\begin{align*} \mathbf{z}_t=h(\mathbf{x}_t)+\mathbf{v}_t, &&\mathbf{v}_t \sim\mathcal{N}(\mathbf{0}, \mathbf{R}_t) \end{align*}

Let

\mathbf{F}_t=\left.\frac{\partial f}{\partial \mathbf{x}}\right|_{\mathbf{x}=\boldsymbol{\mu}_{t-1}}, \qquad \mathbf{W}_t=\left.\frac{\partial f}{\partial \mathbf{w}}\right|_{\mathbf{x}=\boldsymbol{\mu}_{t-1}}, \qquad \mathbf{H}_t=\left.\frac{\partial h}{\partial \mathbf{x}}\right|_{\mathbf{x}=\boldsymbol{\mu}_t^-}, \qquad \mathbf{V}_t=\left.\frac{\partial h}{\partial \mathbf{v}}\right|_{\mathbf{x}=\boldsymbol{\mu}_t^-}

The EKF algorithm is

\begin{align} &\textbf{Input: } \boldsymbol{\mu}_{t-1},\; \boldsymbol{\Sigma}_{t-1},\; \mathbf{u}_t,\; \mathbf{z}_t \nonumber\\[6pt] &\quad\quad \text{1:}\quad \boldsymbol{\mu}_t^- \leftarrow f(\mathbf{u}_t,\boldsymbol{\mu}_{t-1}) &&\text{(predicted mean)} \nonumber\\ &\quad\quad\text{2:}\quad \boldsymbol{\Sigma}_t^- \leftarrow \mathbf{F}_t\boldsymbol{\Sigma}_{t-1}\mathbf{F}_t^\top + \mathbf{W}_t\mathbf{Q}_t\mathbf{W}_t^\top &&\text{(predicted covariance)} \nonumber\\ &\quad\quad\text{3:}\quad \boldsymbol{\nu}_t \leftarrow \mathbf{z}_t - h(\boldsymbol{\mu}_t^-) &&\text{(innovation)} \nonumber\\ &\quad\quad\text{4:}\quad \mathbf{S}_t \leftarrow \mathbf{H}_t\boldsymbol{\Sigma}_t^-\mathbf{H}_t^\top + \mathbf{V}_t\mathbf{R}_t\mathbf{V}_t^\top &&\text{(innovation covariance)} \nonumber\\ &\quad\quad\text{5:}\quad \mathbf{K}_t \leftarrow \boldsymbol{\Sigma}_t^-\mathbf{H}_t^\top\mathbf{S}_t^{-1} &&\text{(filter gain)} \nonumber\\ &\quad\quad\text{6:}\quad \boldsymbol{\mu}_t \leftarrow \boldsymbol{\mu}_t^- + \mathbf{K}_t\boldsymbol{\nu}_t &&\text{(corrected mean)} \nonumber\\ &\quad\quad\text{7:}\quad \boldsymbol{\Sigma}_t \leftarrow (\mathbf{I} - \mathbf{K}_t\mathbf{H}_t)\boldsymbol{\Sigma}_t^- &&\text{(corrected covariance)} \nonumber\\ &\quad\quad\text{8:}\quad \boldsymbol{\Sigma}_t \leftarrow (\mathbf{I} - \mathbf{K}_t\mathbf{H}_t)\boldsymbol{\Sigma}_t^-(\mathbf{I} - \mathbf{K}_t\mathbf{H}_t)^\top + \mathbf{K}_t\mathbf{R}_t\mathbf{K}_t^\top &&\text{(numerically stable form)} \nonumber\\[6pt] &\textbf{return}\;\; \boldsymbol{\mu}_t,\; \boldsymbol{\Sigma}_t \nonumber \end{align}

Unscented Kalman Filter (UKF)

Another non-linear Kalman filter is the unscented Kalman filter which uses deterministic sampling to approximate the distribution.

Let a function $f:\mathbb{R}^{n}\to\mathbb{R}^m$ be a non-linear transformation. Suppose $\mathbf{x}\in\mathbb{R}^n$ and $\mathbf{x}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma})$ . We draw a set of $n$ sigmapoints, $\mathcal{X}=\{\mathbf{x}_i\}_{i=1}^{n}$ , and compute a weight $w_i$ for each sigma point by executing

\begin{align*} &\texttt{SigmaPoints}(\boldsymbol{\mu},\mathbf{\Sigma}):\\ &\quad\quad \mathbf{x}_0 = \boldsymbol{\mu} \nonumber\\ &\quad\quad \mathbf{x}_i = \boldsymbol{\mu} + \boldsymbol{\ell}'_i, \qquad i = 1,\ldots,n \nonumber\\ &\quad\quad \mathbf{x}_i = \boldsymbol{\mu} - \boldsymbol{\ell}'_{i-n}, \qquad i = n+1,\ldots,2n \nonumber\\[4pt] &\quad\quad w_0 = \dfrac{\kappa}{n+\kappa} \nonumber\\[6pt] &\quad\quad w_i = \dfrac{1}{2(n+\kappa)}, \qquad i = 1,\ldots,2n\\ &\textbf{return}\,\, (\mathcal{X},\mathbf{w}) \end{align*}

Further let $\mathbf{y}=f(\mathbf{x})$ , then $\mathbf{y}\sim\mathcal{N}(\boldsymbol{\mu}',\mathbf{\Sigma}')$ , where

\begin{align*} \boldsymbol{\mu}'&=\sum_{i=0}^{2n}w_if(\mathbf{x}_i)\\ \mathbf{\Sigma}'&=\sum_{i=0}^{2n}w_i(f(\mathbf{x}_i)-\boldsymbol{\mu}')(f(\mathbf{x}_i)-\boldsymbol{\mu}')^{\top} \end{align*}

The UKF algorithm is

\begin{align} &\textbf{Input: } \boldsymbol{\mu}_{t-1},\; \boldsymbol{\Sigma}_{t-1},\; \mathbf{u}_t,\; \mathbf{z}_t \nonumber\\[6pt] &\quad\quad\text{1:}\quad (\mathcal{X}_{t-1},\,\mathbf{w}^-) \leftarrow \texttt{SigmaPoints}(\boldsymbol{\mu}_{t-1},\boldsymbol{\Sigma}_{t-1}) \nonumber\\ &\quad\quad\text{2:}\quad \boldsymbol{\mu}_t^- = \sum_{i=0}^{2n} w_i^-\, f(\mathbf{u}_t, \mathbf{x}_{t-1,i}) &&\text{(predicted mean)} \nonumber\\ &\quad\quad\text{3:}\quad \boldsymbol{\Sigma}_t^- \leftarrow \sum_{i=0}^{2n} w_i^- \bigl(f(\mathbf{u}_t,\mathbf{x}_{t-1,i}) - \boldsymbol{\mu}_t^-\bigr) \bigl(f(\mathbf{u}_t,\mathbf{x}_{t-1,i}) - \boldsymbol{\mu}_t^-\bigr)^\top + \mathbf{Q}_t &&\text{(predicted covariance)} \nonumber\\ &\quad\quad\text{4:}\quad (\mathcal{X}_t^-,\,\mathbf{w}) \leftarrow \texttt{SigmaPoints}(\boldsymbol{\mu}_t^-, \boldsymbol{\Sigma}_t^-) \nonumber\\ &\quad\quad\text{5:}\quad \mathbf{z}_t^- = \sum_{i=0}^{2n} w_i\, h(\mathbf{x}_{t,i}^-) &&\text{(predicted measurement)} \nonumber\\ &\quad\quad\text{6:}\quad \boldsymbol{\nu}_t \leftarrow \mathbf{z}_t - \mathbf{z}_t^- &&\text{(innovation)} \nonumber\\ &\quad\quad\text{7:}\quad \mathbf{S}_t \leftarrow \sum_{i=0}^{2n} w_i \bigl(h(\mathbf{x}_{t,i}^-) - \mathbf{z}_t^-\bigr) \bigl(h(\mathbf{x}_{t,i}^-) - \mathbf{z}_t^-\bigr)^\top + \mathbf{R}_t &&\text{(innovation covariance)} \nonumber\\ &\quad\quad\text{8:}\quad \boldsymbol{\Sigma}_t^{xz} \leftarrow \sum_{i=0}^{2n} w_i \bigl(\mathbf{x}_{t,i}^- - \boldsymbol{\mu}_t^-\bigr) \bigl(h(\mathbf{x}_{t,i}^-) - \mathbf{z}_t^-\bigr)^\top &&\text{(cross covariance)} \nonumber\\ &\quad\quad\text{9:}\quad \mathbf{K}_t \leftarrow \boldsymbol{\Sigma}_t^{xz}\,\mathbf{S}_t^{-1} &&\text{(filter gain)} \nonumber\\ &\quad\quad\text{10:}\quad \boldsymbol{\mu}_t \leftarrow \boldsymbol{\mu}_t^- + \mathbf{K}_t\boldsymbol{\nu}_t &&\text{(corrected mean)} \nonumber\\ &\quad\quad\text{11:}\quad \boldsymbol{\Sigma}_t \leftarrow \boldsymbol{\Sigma}_t^- - \mathbf{K}_t\mathbf{S}_t\mathbf{K}_t^\top &&\text{(corrected covariance)} \nonumber\\[6pt] &\textbf{return}\;\; \boldsymbol{\mu}_t,\; \boldsymbol{\Sigma}_t \nonumber \end{align}

Reference

[1] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. Mit Press, 2010.
[2] M. Ghaffari, ROB530 Lecture Slides — Kalman Filtering. University of Michigan, 2026.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

You May Also Enjoy

Optimization Part 1 Basics and Least Squares

March 31, 2026 Mathematics

In common engineering problems, the optimization problem is to minimize the residual of a parametric model with respect to some observed data points.Given N...

Occupancy Grid Mapping

March 22, 2026 Robotics

Besides state estimation or localization, which provide a robot with knowledge of where it is, it’s equally important for a mobile robot to perceive its surr...

Lie Theory in Robot Motion

March 10, 2026 Robotics

This note covers some of the fundamental concepts in Lie group and Lie algebra, and their applications to representing rigid body motion in robotics. A group...

Acoustic Damping with a Helmholtz Resonator

August 6, 2024 Physics

Two fundamental equations in acoustics areContinuity EquationThe continuity equation states that the rate at which mass enters a system is equal to the rate ...

Guanyu Xu