In robot state estimation, the Bayes filter is a probabilistic approach that estimates the state from a sequence of controls and measurements by recursively performing prediction with a motion model and correction with a measurement model . In practice, some common instantiations of the Bayes framework are the Kalman filter , information filter, and particle filter .
Mathematical Formulation of Bayes Filter Given a sequence of measurements z 1 : t = z 1 , … , z t \mathrm{z}_{1:t}=\mathbf{z}_1,\dots,\mathbf{z}_t z 1 : t = z 1 , … , z t and controls u 1 : t = u 1 , … , u t \mathbf{u}_{1:t}=\mathbf{u}_1,\dots,\mathbf{u}_t u 1 : t = u 1 , … , u t , and a measurement model p ( z t ∣ x t ) p(\mathbf{z}_t\mid\mathbf{x}_t) p ( z t ∣ x t ) and a motion model p ( x t ∣ x t − 1 , u t ) p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{u}_t) p ( x t ∣ x t − 1 , u t ) . The Bayes filter provides a recursive framework for estimating the state x t \mathbf{x}_t x t in terms of a probability distribution function, known as the belief b e l ( x t ) ≜ p ( x t ∣ z 1 : t , u 1 , t ) \mathrm{bel}(\mathbf{x}_t)\triangleq p(\mathbf{x}_t\mid\mathbf{z}_{1:t},\mathbf{u}_{1,t}) bel ( x t ) ≜ p ( x t ∣ z 1 : t , u 1 , t ) , or posterior .
The dynamic Bayes network that characterizes the evolution of controls, states, and measurements. [1] The filtering algorithm is built upon two assumptions of conditional independence:
Motion model is (1st-order) Markov :p ( x t ∣ x 0 : t − 1 , u 1 : t ) = p ( x t ∣ x t − 1 , u t ) p(\mathbf{x}_t\mid\mathbf{x}_{0:t-1},\mathbf{u}_{1:t})=p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{u}_t) p ( x t ∣ x 0 : t − 1 , u 1 : t ) = p ( x t ∣ x t − 1 , u t ) The measurement model is memoryless :p ( z t ∣ x 0 : t , z 1 : t − 1 , u 1 : t ) = p ( z t ∣ x t ) p(\mathbf{z}_t\mid\mathbf{x}_{0:t},\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})=p(\mathbf{z}_{t}\mid\mathbf{x}_t) p ( z t ∣ x 0 : t , z 1 : t − 1 , u 1 : t ) = p ( z t ∣ x t ) Given these two assumptions, we can propagate the belief over time:
b e l ( x t ) = p ( x t ∣ z 1 : t , u 1 : t ) = p ( x t ∣ z t , z 1 : t − 1 , u 1 : t ) → definition = η p ( z t ∣ x t , z 1 : t − 1 , u 1 : t ) p ( x t ∣ z 1 : t − 1 , u 1 : t ) → Bayes rule = η p ( z t ∣ x t ) p ( x t ∣ z 1 : t − 1 , u 1 : t ) → memoryless \begin{align*} \mathrm{bel}(\mathbf{x}_t) &=p(\mathbf{x}_t\mid\mathbf{z}_{1:t},\mathbf{u}_{1:t})=p(\mathbf{x}_t\mid\mathbf{z}_t,\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) && \to\text{definition} \\ &=\eta p(\mathbf{z}_t\mid\mathbf{x}_t,\mathbf{z}_{1:t-1}, \mathbf{u}_{1:t})p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) && \to\text{Bayes rule}\\ &=\eta p(\mathbf{z}_t\mid\mathbf{x}_t)p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) && \to \text{memoryless} \end{align*} bel ( x t ) = p ( x t ∣ z 1 : t , u 1 : t ) = p ( x t ∣ z t , z 1 : t − 1 , u 1 : t ) = η p ( z t ∣ x t , z 1 : t − 1 , u 1 : t ) p ( x t ∣ z 1 : t − 1 , u 1 : t ) = η p ( z t ∣ x t ) p ( x t ∣ z 1 : t − 1 , u 1 : t ) → definition → Bayes rule → memoryless We can define the predicted belief of the state at time t t t to be the probability distribution before knowing the current measurement, i.e., b e l ‾ ( x t ) ≜ p ( x t ∣ z 1 : t − 1 , u 1 : t ) \overline{\mathrm{bel}}(\mathbf{x}_t)\triangleq p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) bel ( x t ) ≜ p ( x t ∣ z 1 : t − 1 , u 1 : t ) . Then, the above equation becomes
b e l ( x t ) = η p ( z t ∣ x t ) b e l ‾ ( x t ) \begin{equation} \mathrm{bel}(\mathrm{x}_t)=\eta p(\mathbf{z}_t\mid\mathbf{x}_t)\overline{\mathrm{bel}}(\mathbf{x}_t) \end{equation} bel ( x t ) = η p ( z t ∣ x t ) bel ( x t ) where η \eta η is the normalization coefficient, and
1 η = ∫ p ( z t ∣ x t ) b e l ‾ ( x t ) d x t \frac{1}{\eta}=\int p(\mathbf{z}_t\mid\mathbf{x}_t)\overline{\mathrm{bel}}(\mathbf{x}_t)\mathrm{d}\mathbf{x}_t η 1 = ∫ p ( z t ∣ x t ) bel ( x t ) d x t which is essentially a correction applied to the belief after observing the current measurement.
What remains to be done, obviously, is to predict the belief b e l ‾ ( x t ) \overline{\mathrm{bel}}(\mathbf{x}_t) bel ( x t ) from the previous state:
b e l ‾ ( x t ) = p ( x t ∣ z 1 : t − 1 , u 1 : t ) = ∫ p ( x t ∣ x t − 1 , z 1 : t − 1 , u 1 : t ) p ( x t − 1 ∣ z 1 : t − 1 , u 1 : t ) d x t − 1 → marginalize x t − 1 = ∫ p ( x t ∣ x t − 1 , u t ) p ( x t − 1 ∣ z 1 : t − 1 , u 1 : t − 1 ) d x t − 1 → Markov assumption \begin{align*} \overline{\mathrm{bel}}(\mathbf{x}_t) &=p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) \\ &=\int p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})p(\mathbf{x}_{t-1}\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})\mathrm{d}\mathbf{x}_{t-1} && \to \text{marginalize}\,\mathbf{x}_{t-1}\\ &=\int p(\mathbf{x}_t\mid\mathbf{x}_{t-1}, \mathbf{u}_t)p(\mathbf{x}_{t-1}\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t-1})\mathrm{d}\mathbf{x}_{t-1} &&\to\text{Markov assumption} \end{align*} bel ( x t ) = p ( x t ∣ z 1 : t − 1 , u 1 : t ) = ∫ p ( x t ∣ x t − 1 , z 1 : t − 1 , u 1 : t ) p ( x t − 1 ∣ z 1 : t − 1 , u 1 : t ) d x t − 1 = ∫ p ( x t ∣ x t − 1 , u t ) p ( x t − 1 ∣ z 1 : t − 1 , u 1 : t − 1 ) d x t − 1 → marginalize x t − 1 → Markov assumption Then, recognizing p ( x t − 1 ∣ z 1 : t − 1 , u 1 : t − 1 ) = b e l ( x t − 1 ) p(\mathbf{x}_{t-1}\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t-1})=\mathrm{bel}(\mathbf{x}_{t-1}) p ( x t − 1 ∣ z 1 : t − 1 , u 1 : t − 1 ) = bel ( x t − 1 ) , we have
b e l ‾ ( x t ) = ∫ p ( x t ∣ u t , x t − 1 ) b e l ( x t − 1 ) d x t − 1 \begin{equation} \overline{\mathrm{bel}}(\mathbf{x}_t)=\int p(\mathbf{x}_t\mid\mathbf{u}_t,\mathbf{x}_{t-1})\mathrm{bel}(\mathbf{x}_{t-1})\mathrm{d}\mathbf{x}_{t-1} \end{equation} bel ( x t ) = ∫ p ( x t ∣ u t , x t − 1 ) bel ( x t − 1 ) d x t − 1 In summary, the Bayes filter provides a two-step framework for updating the belief of the current state x t \mathbf{x}_t x t from the belief of the previous state x t − 1 \mathbf{x}_{t-1} x t − 1 :
Prediction using the motion model (Equation 1);Correction using the measurement model (Equation 2).These two steps can be performed recursively to estimate the state over time.
Kalman Filter The Kalman filter is an instantiation of the Bayes filter with the assumption of linear Gaussian models . This means the motion model and the measurement model should be
linear with respect to their arguments :Specifically, suppose x ∈ R m \mathbf{x}\in\mathbb{R}^m x ∈ R m , u ∈ R n \mathbf{u}\in\mathbb{R}^n u ∈ R n , and z ∈ R p \mathbf{z}\in\mathbb{R}^p z ∈ R p . Let F ∈ R m × m \mathbf{F}\in \mathbb{R}^{m\times m} F ∈ R m × m , G ∈ R m × n \mathbf{G}\in\mathbb{R}^{m\times n} G ∈ R m × n , and H ∈ R p × m \mathbf{H}\in\mathbb{R}^{p\times m} H ∈ R p × m be matrices (linear maps).
x t = F t x t − 1 + G t u t + w t , w t ∼ N ( 0 , Q t ) \begin{align} \mathbf{x}_t=\mathbf{F}_t\mathbf{x}_{t-1}+\mathbf{G}_t\mathbf{u}_t+\mathbf{w}_t, &&\mathbf{w}_t\sim\mathcal{N}(\mathbf{0},\mathbf{Q}_t) \end{align} x t = F t x t − 1 + G t u t + w t , w t ∼ N ( 0 , Q t ) z t = H t x t + v t , v t ∼ N ( 0 , R t ) \begin{align} \mathbf{z}_t=\mathbf{H}_t\mathbf{x}_t+\mathbf{v}_t, &&\mathbf{v}_t \sim\mathcal{N}(\mathbf{0}, \mathbf{R}_t) \end{align} z t = H t x t + v t , v t ∼ N ( 0 , R t ) where w ∈ R m \mathbf{w}\in\mathbb{R}^m w ∈ R m and v t ∈ R p \mathbf{v}_t\in\mathbb{R}^p v t ∈ R p are white noise with variance Q ∈ R m × m \mathbf{Q}\in\mathbb{R}^{m\times m} Q ∈ R m × m and R ∈ R p × p \mathbf{R}\in\mathbb{R}^{p\times p} R ∈ R p × p , respectively.
White noise A discrete-time random process (a sequence of random vectors), w k \mathbf{w}_k w k , is called white noise if
E [ w k w j ⊤ ] = Q k δ k j , δ k j = { 1 k = j 0 k ≠ j \begin{align*} \mathbb{E}[\mathbf{w}_k\mathbf{w}_j^{\top}]=\mathbf{Q}_k\delta_{kj}, &&\delta_{kj}=\left\{\begin{matrix}1 & k=j\\ 0 & k\ne j\end{matrix}\right. \end{align*} E [ w k w j ⊤ ] = Q k δ kj , δ kj = { 1 0 k = j k = j following a multivariate Gaussian distribution: Motion model: p ( x t ∣ x t − 1 , u t ) = N ( x t ; F t x t − 1 + G t u t , Q t ) p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{u}_t)=\mathcal{N}(\mathbf{x}_t;\mathbf{F}_t\mathbf{x}_{t-1}+\mathbf{G}_t\mathbf{u}_t, \mathbf{Q}_t) p ( x t ∣ x t − 1 , u t ) = N ( x t ; F t x t − 1 + G t u t , Q t ) ; Measurement model: p ( z t ∣ x t ) = N ( z t ; H t x t , R t ) p(\mathbf{z}_t\mid\mathbf{x}_t)=\mathcal{N}(\mathbf{z}_t;\mathbf{H}_t\mathbf{x}_t,\mathbf{R}_t) p ( z t ∣ x t ) = N ( z t ; H t x t , R t ) . Note: here we need to interpret x t − 1 \mathbf{x}_{t-1} x t − 1 as a realization rather than a random variable.
Multivariate Gaussian PDF N ( x ; μ , Σ ) \mathcal{N}(\mathbf{x};\boldsymbol{\mu},\mathbf{\Sigma}) N ( x ; μ , Σ ) is the probability distribution function of a random variable x ∈ R m \mathbf{x}\in\mathbb{R}^m x ∈ R m that follows a Gaussian distribution with mean μ ∈ R m \boldsymbol{\mu}\in\mathbb{R}^m μ ∈ R m and covariance matrix Σ ∈ R m × m \mathbf{\Sigma}\in\mathbb{R}^{m\times m} Σ ∈ R m × m ,
N ( x ; μ , Σ ) = p ( x ) = 1 ( 2 π ) m det ( Σ ) exp ( − 1 2 ( x − μ ) ⊤ Σ − 1 ( x − μ ) ) \mathcal{N}(\mathbf{x};\boldsymbol{\mu},\mathbf{\Sigma})=p(\mathbf{x})=\frac{1}{\sqrt{(2\pi)^m\det(\mathbf{\Sigma})}}\exp\left(-\frac{1}{2}\left(\mathbf{x}-\boldsymbol{\mu}\right)^{\top}\mathbf{\Sigma}^{-1}\left(\mathbf{x}-\boldsymbol{\mu}\right)\right) N ( x ; μ , Σ ) = p ( x ) = ( 2 π ) m det ( Σ ) 1 exp ( − 2 1 ( x − μ ) ⊤ Σ − 1 ( x − μ ) ) Before deriving the algorithm for the Kalman filter, there are two important properties of the multivariate Gaussian distribution that are useful here.
Affine transformation: Let x \mathbf{x} x be a random vector following a Gaussian distribution x ∼ N ( μ , Σ ) \mathbf{x}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma}) x ∼ N ( μ , Σ ) , then the random vector y = A x + b \mathbf{y}=\mathbf{A}\mathbf{x}+\mathbf{b} y = Ax + b follows a Gaussian distribution y ∼ N ( A μ + b , A Σ A ⊤ ) \mathbf{y}\sim\mathcal{N}(\mathbf{A}\boldsymbol{\mu}+\mathbf{b},\mathbf{A}\mathbf{\Sigma}\mathbf{A}^{\top}) y ∼ N ( A μ + b , AΣ A ⊤ ) .
Conditional distribution: Let y ∼ N ( μ , Σ ) \mathbf{y}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma}) y ∼ N ( μ , Σ ) be a random vector of dimension m + p m+p m + p . Partition y \mathbf{y} y into a vector of dimension m m m , x ∈ R m \mathbf{x}\in\mathbb{R}^{m} x ∈ R m and a vector of dimension p p p , z ∈ R p \mathbf{z}\in\mathbb{R}^p z ∈ R p , and partition μ \boldsymbol{\mu} μ and Σ \mathbf{\Sigma} Σ respectively,
[ x z ] ∼ N ( [ μ x μ z ] , [ Σ x x Σ x z Σ z x Σ z z ] ) \begin{equation*} \left[\begin{matrix} \mathbf{x} \\ \mathbf{z} \end{matrix}\right]\sim \mathcal{N}\left( \left[\begin{matrix} \boldsymbol{\mu}_x\\ \boldsymbol{\mu}_z \end{matrix}\right], \left[ \begin{matrix} \mathbf{\Sigma}_{xx} &\mathbf{\Sigma}_{xz}\\ \mathbf{\Sigma}_{zx} &\mathbf{\Sigma}_{zz} \end{matrix} \right] \right) \end{equation*} [ x z ] ∼ N ( [ μ x μ z ] , [ Σ xx Σ z x Σ x z Σ zz ] ) then, the conditional distribution p ( x ∣ z = z ) = N ( x ; μ x ∣ z , Σ x ∣ z ) p(\mathbf{x}\mid\mathbf{z}=z)=\mathcal{N}(\mathbf{x};\boldsymbol{\mu}_{\mathbf{x}\mid\mathbf{z}},\mathbf{\Sigma}_{\mathbf{x}\mid\mathbf{z}}) p ( x ∣ z = z ) = N ( x ; μ x ∣ z , Σ x ∣ z ) , where
μ x ∣ z = μ x + Σ x z Σ z z − 1 ( z − μ z ) , Σ x ∣ z = Σ x x − Σ x z Σ z z − 1 Σ z x \begin{align*} \boldsymbol{\mu}_{\mathbf{x}\mid\mathbf{z}}=\boldsymbol{\mu}_x+\mathbf{\Sigma}_{xz}\boldsymbol{\Sigma}_{zz}^{-1}\left(z-\mathbf{\mu}_z\right), &&\mathbf{\Sigma}_{\mathbf{x}\mid\mathbf{z}}=\mathbf{\Sigma}_{xx}-\mathbf{\Sigma}_{xz}\mathbf{\Sigma}_{zz}^{-1}\mathbf{\Sigma}_{zx} \end{align*} μ x ∣ z = μ x + Σ x z Σ zz − 1 ( z − μ z ) , Σ x ∣ z = Σ xx − Σ x z Σ zz − 1 Σ z x Now, if we assume the initial belief to be Gaussian, i.e., b e l ( x 0 ) = p ( x 0 ) = N ( x 0 ; μ 0 , Σ 0 ) \mathrm{bel}(\mathbf{x}_0)=p(\mathbf{x}_0)=\mathcal{N}(\mathbf{x}_0;\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0) bel ( x 0 ) = p ( x 0 ) = N ( x 0 ; μ 0 , Σ 0 ) . Then, from equations (1) and (2), we can see that all subsequent beliefs will be Gaussian. Thus,
b e l ‾ ( x t ) = p ( x t ∣ z 1 : t − 1 , u 1 : t ) = N ( x t ; μ t − , Σ t − ) b e l ( x t ) = p ( x t ∣ z 1 : t , u 1 , t ) = N ( x t ; μ t , Σ t ) \begin{align*} \overline{\mathrm{bel}}(\mathbf{x}_t) &=p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})=\mathcal{N}(\mathbf{x}_t;\boldsymbol{\mu}_t^-,\mathbf{\Sigma}_t^-)\\ \mathrm{bel}(\mathbf{x}_t)&= p(\mathbf{x}_t\mid\mathbf{z}_{1:t},\mathbf{u}_{1,t})=\mathcal{N}(\mathbf{x}_t;\boldsymbol{\mu}_t,\mathbf{\Sigma}_t) \end{align*} bel ( x t ) bel ( x t ) = p ( x t ∣ z 1 : t − 1 , u 1 : t ) = N ( x t ; μ t − , Σ t − ) = p ( x t ∣ z 1 : t , u 1 , t ) = N ( x t ; μ t , Σ t ) Then, we only need to track μ t − , Σ t − \boldsymbol{\mu}_t^-,\mathbf{\Sigma}_t^- μ t − , Σ t − in the prediction step and μ t , Σ t \boldsymbol{\mu}_t,\mathbf{\Sigma}_t μ t , Σ t in the correction step.
For the prediction step , use the joint distribution p ( x t , x t − 1 ∣ u 1 : t , z 1 : t − 1 ) p(\mathbf{x}_t,\mathbf{x}_{t-1}\mid\mathbf{u}_{1:t},\mathbf{z}_{1:t-1}) p ( x t , x t − 1 ∣ u 1 : t , z 1 : t − 1 ) and realize
[ x t x t − 1 ] ∼ N ( [ F t μ t − 1 + G t u t μ t − 1 ] , [ F t Σ t − 1 F ⊤ + Q t F t Σ t − 1 Σ t − 1 F t ⊤ Σ t − 1 ] ) \begin{equation*} \left[\begin{matrix} \mathbf{x}_t \\ \mathbf{x}_{t-1} \end{matrix}\right]\sim \mathcal{N}\left( \left[\begin{matrix} \mathbf{F}_t\boldsymbol{\mu}_{t-1}+\mathbf{G}_t\mathbf{u}_t\\ \boldsymbol{\mu}_{t-1} \end{matrix}\right], \left[ \begin{matrix} \mathbf{F}_{t}\mathbf{\Sigma}_{t-1}\mathbf{F}^{\top}+\mathbf{Q}_t &\mathbf{F}_t\mathbf{\Sigma}_{t-1}\\ \mathbf{\Sigma}_{t-1}\mathbf{F}_t^{\top} &\mathbf{\Sigma}_{t-1} \end{matrix} \right] \right) \end{equation*} [ x t x t − 1 ] ∼ N ( [ F t μ t − 1 + G t u t μ t − 1 ] , [ F t Σ t − 1 F ⊤ + Q t Σ t − 1 F t ⊤ F t Σ t − 1 Σ t − 1 ] ) Derivation [2][ x t x t − 1 ] ⏟ y t = [ F t I ] ⏟ A t x t − 1 + [ G t 0 ] ⏟ B t u t + [ I 0 ] ⏟ W t w t \underbrace{\begin{bmatrix}\mathbf{x}_t \\\mathbf{x}_{t-1}\end{bmatrix}}_{\mathbf{y}_t}=\underbrace{\begin{bmatrix}\mathbf{F}_t \\\mathbf{I}\end{bmatrix}}_{\mathbf{A}_t}\mathbf{x}_{t-1}+\underbrace{\begin{bmatrix}\mathbf{G}_t \\\mathbf{0}\end{bmatrix}}_{\mathbf{B}_t}\mathbf{u}_t+\underbrace{\begin{bmatrix}\mathbf{I} \\\mathbf{0}\end{bmatrix}}_{\mathbf{W}_t}\mathbf{w}_t y t [ x t x t − 1 ] = A t [ F t I ] x t − 1 + B t [ G t 0 ] u t + W t [ I 0 ] w t Compute the mean and covariance,
E [ y t ] = E [ A t x t − 1 + B t u t + W t w t ] = A t μ t − 1 + B t u t = [ F t μ t − 1 + G t u t μ t − 1 ] \mathbb{E}[\mathbf{y}_t] = \mathbb{E}\!\left[\mathbf{A}_t \mathbf{x}_{t-1} + \mathbf{B}_t \mathbf{u}_t + \mathbf{W}_t \mathbf{w}_t\right] = \mathbf{A}_t \boldsymbol{\mu}_{t-1} + \mathbf{B}_t \mathbf{u}_t = \begin{bmatrix} \mathbf{F}_t \boldsymbol{\mu}_{t-1} + \mathbf{G}_t \mathbf{u}_t \\ \boldsymbol{\mu}_{t-1} \end{bmatrix} E [ y t ] = E [ A t x t − 1 + B t u t + W t w t ] = A t μ t − 1 + B t u t = [ F t μ t − 1 + G t u t μ t − 1 ] C o v [ y t ] = E [ ( y t − E [ y t ] ) ( y t − E [ y t ] ) ⊤ ] = E [ ( A t ( x t − 1 − μ t − 1 ) + W t w t ) ( A t ( x t − 1 − μ t − 1 ) + W t w t ) ⊤ ] = A t Σ t − 1 A t ⊤ + W t Q t W t ⊤ = [ F t Σ t − 1 F t ⊤ + Q t F t Σ t − 1 Σ t − 1 F t ⊤ Σ t − 1 ] \begin{align*} \mathrm{Cov}[\mathbf{y}_t] &= \mathbb{E}\!\left[(\mathbf{y}_t - \mathbb{E}[\mathbf{y}_t])(\mathbf{y}_t - \mathbb{E}[\mathbf{y}_t])^\top\right] \\ &= \mathbb{E}\!\left[\left(\mathbf{A}_t(\mathbf{x}_{t-1} - \boldsymbol{\mu}_{t-1}) + \mathbf{W}_t \mathbf{w}_t\right) \left(\mathbf{A}_t(\mathbf{x}_{t-1} - \boldsymbol{\mu}_{t-1}) + \mathbf{W}_t \mathbf{w}_t\right)^\top\right] \\ &= \mathbf{A}_t \mathbf{\Sigma}_{t-1} \mathbf{A}_t^\top + \mathbf{W}_t \mathbf{Q}_t \mathbf{W}_t^\top \\ &= \begin{bmatrix} \mathbf{F}_t \mathbf{\Sigma}_{t-1} \mathbf{F}_t^\top + \mathbf{Q}_t & \mathbf{F}_t \mathbf{\Sigma}_{t-1} \\ \mathbf{\Sigma}_{t-1} \mathbf{F}_t^\top & \mathbf{\Sigma}_{t-1} \end{bmatrix} \end{align*} Cov [ y t ] = E [ ( y t − E [ y t ]) ( y t − E [ y t ] ) ⊤ ] = E [ ( A t ( x t − 1 − μ t − 1 ) + W t w t ) ( A t ( x t − 1 − μ t − 1 ) + W t w t ) ⊤ ] = A t Σ t − 1 A t ⊤ + W t Q t W t ⊤ = [ F t Σ t − 1 F t ⊤ + Q t Σ t − 1 F t ⊤ F t Σ t − 1 Σ t − 1 ] Then b e l ‾ ( x t ) = p ( x t ∣ z 1 : t − 1 , u 1 : t ) \overline{\mathrm{bel}}(\mathbf{x}_t) =p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) bel ( x t ) = p ( x t ∣ z 1 : t − 1 , u 1 : t ) can be obtained by marginalize x t − 1 \mathbf{x}_{t-1} x t − 1 ,
μ t − = F t μ t − 1 + G t u t (predicted mean) Σ t − = F t Σ t − 1 F ⊤ + Q t (predicted covariance) \begin{align} \boldsymbol{\mu}_t^-&=\mathbf{F}_t\boldsymbol{\mu}_{t-1}+\mathbf{G_t}\mathbf{u}_t && \text{(predicted mean)} \\ \mathbf{\Sigma}_{t}^-&=\mathbf{F}_{t}\mathbf{\Sigma}_{t-1}\mathbf{F}^{\top}+\mathbf{Q}_t && \text{(predicted covariance)} \end{align} μ t − Σ t − = F t μ t − 1 + G t u t = F t Σ t − 1 F ⊤ + Q t (predicted mean) (predicted covariance) For the correction step , form a joint distribution p ( x t , z t ∣ u 1 : t , z 1 : t − 1 ) p(\mathbf{x}_t,\mathbf{z}_t\mid \mathbf{u}_{1:t},\mathbf{z}_{1:t-1}) p ( x t , z t ∣ u 1 : t , z 1 : t − 1 ) and realize
[ x t z t ] ∼ N ( [ μ t − H t μ t − ] , [ Σ t − Σ t − H t ⊤ H t Σ t − H t Σ t − H t ⊤ + R t ] ) \begin{equation*} \left[\begin{matrix} \mathbf{x}_t \\ \mathbf{z}_t \end{matrix}\right]\sim \mathcal{N}\left( \left[\begin{matrix} \boldsymbol{\mu}_t^-\\ \mathbf{H}_t\boldsymbol{\mu}_t^- \end{matrix}\right], \left[ \begin{matrix} \mathbf{\Sigma}_t^- &\mathbf{\Sigma}_t^-\mathbf{H}_t^{\top}\\ \mathbf{H}_t\mathbf{\Sigma}_{t}^- &\mathbf{H}_t\mathbf{\Sigma}_{t}^-\mathbf{H}_t^{\top}+\mathbf{R}_t \end{matrix} \right] \right) \end{equation*} [ x t z t ] ∼ N ( [ μ t − H t μ t − ] , [ Σ t − H t Σ t − Σ t − H t ⊤ H t Σ t − H t ⊤ + R t ] ) Derivation [2]Since x t ∼ N ( μ t − , Σ t − ) \mathbf{x}_t\sim\mathcal{N}(\boldsymbol{\mu}_t^-,\mathbf{\Sigma}_t^-) x t ∼ N ( μ t − , Σ t − ) and z t = H t x t + v t \mathbf{z}_t=\mathbf{H_t}\mathbf{x}_t+\mathbf{v}_t z t = H t x t + v t , where v t ∼ N ( 0 , R t ) \mathbf{v}_t\sim\mathcal{N}(\mathbf{0},\mathbf{R}_t) v t ∼ N ( 0 , R t ) . From property 1, we have
z t ∼ N ( H t μ t − , H t Σ t − H t ⊤ + R t ) \mathbf{z}_t\sim\mathcal{N}(\mathbf{H}_t\boldsymbol{\mu}_t^-,\mathbf{H}_t\mathbf{\Sigma}_t^-\mathbf{H}_t^{\top}+\mathbf{R}_t) z t ∼ N ( H t μ t − , H t Σ t − H t ⊤ + R t ) Σ x z = C o v ( x t , z t ) = C o v ( x t , H t x t + v t ) = C o v ( x t , H t x t ) = E [ ( x t − μ t − ) ( H t x t − H t μ t − ) ⊤ ] = E [ ( x t − μ t − ) ( x t − μ t − ) ⊤ H t ⊤ ] = E [ ( x t − μ t − ) ( x t − μ t − ) ⊤ ] H t ⊤ = Σ t − H t ⊤ Σ z x = C o v ( z t , x t ) = ⋯ = H t Σ t − \begin{align*} \mathbf{\Sigma}_{xz}&=\mathrm{Cov}(\mathbf{x}_t,\mathbf{z}_t)=\mathrm{Cov}(\mathbf{x}_t,\mathbf{H_t}\mathbf{x}_t+\mathbf{v}_t)=\mathrm{Cov}(\mathbf{x}_t,\mathbf{H}_t\mathbf{x}_t)\\ &=\mathbb{E}\left[\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)\left(\mathbf{H}_t\mathbf{x}_t-\mathbf{H_t}\boldsymbol{\mu}_t^-\right)^{\top}\right]\\ &=\mathbb{E}\left[\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)^{\top}\mathbf{H_t}^{\top}\right]\\ &=\mathbb{E}\left[\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)^{\top}\right]\mathbf{H_t}^{\top}\\ &=\mathbf{\Sigma}_t^-\mathbf{H_t}^{\top} \\ \mathbf{\Sigma}_{zx}&=\mathrm{Cov}(\mathbf{z}_t,\mathbf{x}_t)=\dots=\mathbf{H}_t\mathbf{\Sigma}_t^- \end{align*} Σ x z Σ z x = Cov ( x t , z t ) = Cov ( x t , H t x t + v t ) = Cov ( x t , H t x t ) = E [ ( x t − μ t − ) ( H t x t − H t μ t − ) ⊤ ] = E [ ( x t − μ t − ) ( x t − μ t − ) ⊤ H t ⊤ ] = E [ ( x t − μ t − ) ( x t − μ t − ) ⊤ ] H t ⊤ = Σ t − H t ⊤ = Cov ( z t , x t ) = ⋯ = H t Σ t − Thus, b e l ( x t ) = p ( x t ∣ z t , u 1 : t , z 1 : t − 1 ) \mathrm{bel}(\mathbf{x}_t)=p(\mathbf{x}_t\mid\mathbf{z}_t,\mathbf{u}_{1:t},\mathbf{z}_{1:t-1}) bel ( x t ) = p ( x t ∣ z t , u 1 : t , z 1 : t − 1 ) , where
ν t = z t − H t μ t − (innovation) S t = H t Σ t − H t ⊤ + R t (innovation covariance) K t = Σ t − H t ⊤ S t − 1 (Kalman gain) \begin{align*} \boldsymbol{\nu}_t&=\mathbf{z}_t-\mathbf{H}_t\boldsymbol{\mu}_t^- && \text{(innovation)} \\ \mathbf{S}_t&=\mathbf{H}_t\mathbf{\Sigma}_{t}^-\mathbf{H}_t^{\top}+\mathbf{R}_t && \text{(innovation covariance)}\\ \mathbf{K}_t&=\mathbf{\Sigma}_t^-\mathbf{H}_t^{\top}\mathbf{S}_t^{-1} && \text{(Kalman gain)} \end{align*} ν t S t K t = z t − H t μ t − = H t Σ t − H t ⊤ + R t = Σ t − H t ⊤ S t − 1 (innovation) (innovation covariance) (Kalman gain) μ t = μ t − + K t ν k (corrected mean) Σ t = ( I − K t H t ) Σ t − (corrected covariance) \begin{align} \boldsymbol{\mu}_t&=\boldsymbol{\mu}_t^-+\mathbf{K}_t\boldsymbol{\nu}_k && \text{(corrected mean)}\\ \mathbf{\Sigma}_t&=\left(\mathbf{I}-\mathbf{K}_t\mathbf{H_t}\right)\mathbf{\Sigma}_t^- && \text{(corrected covariance)} \end{align} μ t Σ t = μ t − + K t ν k = ( I − K t H t ) Σ t − (corrected mean) (corrected covariance) Finally, normalize for numerical stability
Σ t = ( I − K t H t ) Σ t − ( I − K t H t ) ⊤ + K t R t K t ⊤ \begin{equation} \mathbf{\Sigma}_t=\left(\mathbf{I}-\mathbf{K}_t\mathbf{H_t}\right)\mathbf{\Sigma}_t^-\left(\mathbf{I}-\mathbf{K}_t\mathbf{H_t}\right)^{\top}+\mathbf{K}_t\mathbf{R}_t\mathbf{K}_t^{\top} \end{equation} Σ t = ( I − K t H t ) Σ t − ( I − K t H t ) ⊤ + K t R t K t ⊤
Extended Kalman Filter (EKF) One way to handle the case where motion model and measurement model are not linear is to approximate the functions via local linearization (Taylor expansion) . Specifically, we extend the Kalman filter algorithm to cases where models are non-linear:
x t = f ( u t , x t − 1 ) + w t , w t ∼ N ( 0 , Q t ) \begin{align*} \mathbf{x}_t=f(\mathbf{u}_t,\mathbf{x}_{t-1})+\mathbf{w}_t, &&\mathbf{w}_t\sim\mathcal{N}(\mathbf{0},\mathbf{Q}_t) \end{align*} x t = f ( u t , x t − 1 ) + w t , w t ∼ N ( 0 , Q t ) z t = h ( x t ) + v t , v t ∼ N ( 0 , R t ) \begin{align*} \mathbf{z}_t=h(\mathbf{x}_t)+\mathbf{v}_t, &&\mathbf{v}_t \sim\mathcal{N}(\mathbf{0}, \mathbf{R}_t) \end{align*} z t = h ( x t ) + v t , v t ∼ N ( 0 , R t ) F t = ∂ f ∂ x ∣ x = μ t − 1 , W t = ∂ f ∂ w ∣ x = μ t − 1 , H t = ∂ h ∂ x ∣ x = μ t − , V t = ∂ h ∂ v ∣ x = μ t − \mathbf{F}_t=\left.\frac{\partial f}{\partial \mathbf{x}}\right|_{\mathbf{x}=\boldsymbol{\mu}_{t-1}}, \qquad \mathbf{W}_t=\left.\frac{\partial f}{\partial \mathbf{w}}\right|_{\mathbf{x}=\boldsymbol{\mu}_{t-1}}, \qquad \mathbf{H}_t=\left.\frac{\partial h}{\partial \mathbf{x}}\right|_{\mathbf{x}=\boldsymbol{\mu}_t^-}, \qquad \mathbf{V}_t=\left.\frac{\partial h}{\partial \mathbf{v}}\right|_{\mathbf{x}=\boldsymbol{\mu}_t^-} F t = ∂ x ∂ f x = μ t − 1 , W t = ∂ w ∂ f x = μ t − 1 , H t = ∂ x ∂ h x = μ t − , V t = ∂ v ∂ h x = μ t − Input: μ t − 1 , Σ t − 1 , u t , z t 1: μ t − ← f ( u t , μ t − 1 ) (predicted mean) 2: Σ t − ← F t Σ t − 1 F t ⊤ + W t Q t W t ⊤ (predicted covariance) 3: ν t ← z t − h ( μ t − ) (innovation) 4: S t ← H t Σ t − H t ⊤ + V t R t V t ⊤ (innovation covariance) 5: K t ← Σ t − H t ⊤ S t − 1 (filter gain) 6: μ t ← μ t − + K t ν t (corrected mean) 7: Σ t ← ( I − K t H t ) Σ t − (corrected covariance) 8: Σ t ← ( I − K t H t ) Σ t − ( I − K t H t ) ⊤ + K t R t K t ⊤ (numerically stable form) return μ t , Σ t \begin{align} &\textbf{Input: } \boldsymbol{\mu}_{t-1},\; \boldsymbol{\Sigma}_{t-1},\; \mathbf{u}_t,\; \mathbf{z}_t \nonumber\\[6pt] &\quad\quad \text{1:}\quad \boldsymbol{\mu}_t^- \leftarrow f(\mathbf{u}_t,\boldsymbol{\mu}_{t-1}) &&\text{(predicted mean)} \nonumber\\ &\quad\quad\text{2:}\quad \boldsymbol{\Sigma}_t^- \leftarrow \mathbf{F}_t\boldsymbol{\Sigma}_{t-1}\mathbf{F}_t^\top + \mathbf{W}_t\mathbf{Q}_t\mathbf{W}_t^\top &&\text{(predicted covariance)} \nonumber\\ &\quad\quad\text{3:}\quad \boldsymbol{\nu}_t \leftarrow \mathbf{z}_t - h(\boldsymbol{\mu}_t^-) &&\text{(innovation)} \nonumber\\ &\quad\quad\text{4:}\quad \mathbf{S}_t \leftarrow \mathbf{H}_t\boldsymbol{\Sigma}_t^-\mathbf{H}_t^\top + \mathbf{V}_t\mathbf{R}_t\mathbf{V}_t^\top &&\text{(innovation covariance)} \nonumber\\ &\quad\quad\text{5:}\quad \mathbf{K}_t \leftarrow \boldsymbol{\Sigma}_t^-\mathbf{H}_t^\top\mathbf{S}_t^{-1} &&\text{(filter gain)} \nonumber\\ &\quad\quad\text{6:}\quad \boldsymbol{\mu}_t \leftarrow \boldsymbol{\mu}_t^- + \mathbf{K}_t\boldsymbol{\nu}_t &&\text{(corrected mean)} \nonumber\\ &\quad\quad\text{7:}\quad \boldsymbol{\Sigma}_t \leftarrow (\mathbf{I} - \mathbf{K}_t\mathbf{H}_t)\boldsymbol{\Sigma}_t^- &&\text{(corrected covariance)} \nonumber\\ &\quad\quad\text{8:}\quad \boldsymbol{\Sigma}_t \leftarrow (\mathbf{I} - \mathbf{K}_t\mathbf{H}_t)\boldsymbol{\Sigma}_t^-(\mathbf{I} - \mathbf{K}_t\mathbf{H}_t)^\top + \mathbf{K}_t\mathbf{R}_t\mathbf{K}_t^\top &&\text{(numerically stable form)} \nonumber\\[6pt] &\textbf{return}\;\; \boldsymbol{\mu}_t,\; \boldsymbol{\Sigma}_t \nonumber \end{align} Input: μ t − 1 , Σ t − 1 , u t , z t 1: μ t − ← f ( u t , μ t − 1 ) 2: Σ t − ← F t Σ t − 1 F t ⊤ + W t Q t W t ⊤ 3: ν t ← z t − h ( μ t − ) 4: S t ← H t Σ t − H t ⊤ + V t R t V t ⊤ 5: K t ← Σ t − H t ⊤ S t − 1 6: μ t ← μ t − + K t ν t 7: Σ t ← ( I − K t H t ) Σ t − 8: Σ t ← ( I − K t H t ) Σ t − ( I − K t H t ) ⊤ + K t R t K t ⊤ return μ t , Σ t (predicted mean) (predicted covariance) (innovation) (innovation covariance) (filter gain) (corrected mean) (corrected covariance) (numerically stable form)
Unscented Kalman Filter (UKF) Another non-linear Kalman filter is the unscented Kalman filter which uses deterministic sampling to approximate the distribution.
Let a function f : R n → R m f:\mathbb{R}^{n}\to\mathbb{R}^m f : R n → R m be a non-linear transformation. Suppose x ∈ R n \mathbf{x}\in\mathbb{R}^n x ∈ R n and x ∼ N ( μ , Σ ) \mathbf{x}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma}) x ∼ N ( μ , Σ ) . We draw a set of n n n sigmapoints, X = { x i } i = 1 n \mathcal{X}=\{\mathbf{x}_i\}_{i=1}^{n} X = { x i } i = 1 n , and compute a weight w i w_i w i for each sigma point by executing
SigmaPoints ( μ , Σ ) : x 0 = μ x i = μ + ℓ i ′ , i = 1 , … , n x i = μ − ℓ i − n ′ , i = n + 1 , … , 2 n w 0 = κ n + κ w i = 1 2 ( n + κ ) , i = 1 , … , 2 n return ( X , w ) \begin{align*} &\texttt{SigmaPoints}(\boldsymbol{\mu},\mathbf{\Sigma}):\\ &\quad\quad \mathbf{x}_0 = \boldsymbol{\mu} \nonumber\\ &\quad\quad \mathbf{x}_i = \boldsymbol{\mu} + \boldsymbol{\ell}'_i, \qquad i = 1,\ldots,n \nonumber\\ &\quad\quad \mathbf{x}_i = \boldsymbol{\mu} - \boldsymbol{\ell}'_{i-n}, \qquad i = n+1,\ldots,2n \nonumber\\[4pt] &\quad\quad w_0 = \dfrac{\kappa}{n+\kappa} \nonumber\\[6pt] &\quad\quad w_i = \dfrac{1}{2(n+\kappa)}, \qquad i = 1,\ldots,2n\\ &\textbf{return}\,\, (\mathcal{X},\mathbf{w}) \end{align*} SigmaPoints ( μ , Σ ) : x 0 = μ x i = μ + ℓ i ′ , i = 1 , … , n x i = μ − ℓ i − n ′ , i = n + 1 , … , 2 n w 0 = n + κ κ w i = 2 ( n + κ ) 1 , i = 1 , … , 2 n return ( X , w ) Further let y = f ( x ) \mathbf{y}=f(\mathbf{x}) y = f ( x ) , then y ∼ N ( μ ′ , Σ ′ ) \mathbf{y}\sim\mathcal{N}(\boldsymbol{\mu}',\mathbf{\Sigma}') y ∼ N ( μ ′ , Σ ′ ) , where
μ ′ = ∑ i = 0 2 n w i f ( x i ) Σ ′ = ∑ i = 0 2 n w i ( f ( x i ) − μ ′ ) ( f ( x i ) − μ ′ ) ⊤ \begin{align*} \boldsymbol{\mu}'&=\sum_{i=0}^{2n}w_if(\mathbf{x}_i)\\ \mathbf{\Sigma}'&=\sum_{i=0}^{2n}w_i(f(\mathbf{x}_i)-\boldsymbol{\mu}')(f(\mathbf{x}_i)-\boldsymbol{\mu}')^{\top} \end{align*} μ ′ Σ ′ = i = 0 ∑ 2 n w i f ( x i ) = i = 0 ∑ 2 n w i ( f ( x i ) − μ ′ ) ( f ( x i ) − μ ′ ) ⊤ Input: μ t − 1 , Σ t − 1 , u t , z t 1: ( X t − 1 , w − ) ← SigmaPoints ( μ t − 1 , Σ t − 1 ) 2: μ t − = ∑ i = 0 2 n w i − f ( u t , x t − 1 , i ) (predicted mean) 3: Σ t − ← ∑ i = 0 2 n w i − ( f ( u t , x t − 1 , i ) − μ t − ) ( f ( u t , x t − 1 , i ) − μ t − ) ⊤ + Q t (predicted covariance) 4: ( X t − , w ) ← SigmaPoints ( μ t − , Σ t − ) 5: z t − = ∑ i = 0 2 n w i h ( x t , i − ) (predicted measurement) 6: ν t ← z t − z t − (innovation) 7: S t ← ∑ i = 0 2 n w i ( h ( x t , i − ) − z t − ) ( h ( x t , i − ) − z t − ) ⊤ + R t (innovation covariance) 8: Σ t x z ← ∑ i = 0 2 n w i ( x t , i − − μ t − ) ( h ( x t , i − ) − z t − ) ⊤ (cross covariance) 9: K t ← Σ t x z S t − 1 (filter gain) 10: μ t ← μ t − + K t ν t (corrected mean) 11: Σ t ← Σ t − − K t S t K t ⊤ (corrected covariance) return μ t , Σ t \begin{align} &\textbf{Input: } \boldsymbol{\mu}_{t-1},\; \boldsymbol{\Sigma}_{t-1},\; \mathbf{u}_t,\; \mathbf{z}_t \nonumber\\[6pt] &\quad\quad\text{1:}\quad (\mathcal{X}_{t-1},\,\mathbf{w}^-) \leftarrow \texttt{SigmaPoints}(\boldsymbol{\mu}_{t-1},\boldsymbol{\Sigma}_{t-1}) \nonumber\\ &\quad\quad\text{2:}\quad \boldsymbol{\mu}_t^- = \sum_{i=0}^{2n} w_i^-\, f(\mathbf{u}_t, \mathbf{x}_{t-1,i}) &&\text{(predicted mean)} \nonumber\\ &\quad\quad\text{3:}\quad \boldsymbol{\Sigma}_t^- \leftarrow \sum_{i=0}^{2n} w_i^- \bigl(f(\mathbf{u}_t,\mathbf{x}_{t-1,i}) - \boldsymbol{\mu}_t^-\bigr) \bigl(f(\mathbf{u}_t,\mathbf{x}_{t-1,i}) - \boldsymbol{\mu}_t^-\bigr)^\top + \mathbf{Q}_t &&\text{(predicted covariance)} \nonumber\\ &\quad\quad\text{4:}\quad (\mathcal{X}_t^-,\,\mathbf{w}) \leftarrow \texttt{SigmaPoints}(\boldsymbol{\mu}_t^-, \boldsymbol{\Sigma}_t^-) \nonumber\\ &\quad\quad\text{5:}\quad \mathbf{z}_t^- = \sum_{i=0}^{2n} w_i\, h(\mathbf{x}_{t,i}^-) &&\text{(predicted measurement)} \nonumber\\ &\quad\quad\text{6:}\quad \boldsymbol{\nu}_t \leftarrow \mathbf{z}_t - \mathbf{z}_t^- &&\text{(innovation)} \nonumber\\ &\quad\quad\text{7:}\quad \mathbf{S}_t \leftarrow \sum_{i=0}^{2n} w_i \bigl(h(\mathbf{x}_{t,i}^-) - \mathbf{z}_t^-\bigr) \bigl(h(\mathbf{x}_{t,i}^-) - \mathbf{z}_t^-\bigr)^\top + \mathbf{R}_t &&\text{(innovation covariance)} \nonumber\\ &\quad\quad\text{8:}\quad \boldsymbol{\Sigma}_t^{xz} \leftarrow \sum_{i=0}^{2n} w_i \bigl(\mathbf{x}_{t,i}^- - \boldsymbol{\mu}_t^-\bigr) \bigl(h(\mathbf{x}_{t,i}^-) - \mathbf{z}_t^-\bigr)^\top &&\text{(cross covariance)} \nonumber\\ &\quad\quad\text{9:}\quad \mathbf{K}_t \leftarrow \boldsymbol{\Sigma}_t^{xz}\,\mathbf{S}_t^{-1} &&\text{(filter gain)} \nonumber\\ &\quad\quad\text{10:}\quad \boldsymbol{\mu}_t \leftarrow \boldsymbol{\mu}_t^- + \mathbf{K}_t\boldsymbol{\nu}_t &&\text{(corrected mean)} \nonumber\\ &\quad\quad\text{11:}\quad \boldsymbol{\Sigma}_t \leftarrow \boldsymbol{\Sigma}_t^- - \mathbf{K}_t\mathbf{S}_t\mathbf{K}_t^\top &&\text{(corrected covariance)} \nonumber\\[6pt] &\textbf{return}\;\; \boldsymbol{\mu}_t,\; \boldsymbol{\Sigma}_t \nonumber \end{align} Input: μ t − 1 , Σ t − 1 , u t , z t 1: ( X t − 1 , w − ) ← SigmaPoints ( μ t − 1 , Σ t − 1 ) 2: μ t − = i = 0 ∑ 2 n w i − f ( u t , x t − 1 , i ) 3: Σ t − ← i = 0 ∑ 2 n w i − ( f ( u t , x t − 1 , i ) − μ t − ) ( f ( u t , x t − 1 , i ) − μ t − ) ⊤ + Q t 4: ( X t − , w ) ← SigmaPoints ( μ t − , Σ t − ) 5: z t − = i = 0 ∑ 2 n w i h ( x t , i − ) 6: ν t ← z t − z t − 7: S t ← i = 0 ∑ 2 n w i ( h ( x t , i − ) − z t − ) ( h ( x t , i − ) − z t − ) ⊤ + R t 8: Σ t x z ← i = 0 ∑ 2 n w i ( x t , i − − μ t − ) ( h ( x t , i − ) − z t − ) ⊤ 9: K t ← Σ t x z S t − 1 10: μ t ← μ t − + K t ν t 11: Σ t ← Σ t − − K t S t K t ⊤ return μ t , Σ t (predicted mean) (predicted covariance) (predicted measurement) (innovation) (innovation covariance) (cross covariance) (filter gain) (corrected mean) (corrected covariance)
Reference [1] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics . Mit Press, 2010. [2] M. Ghaffari, ROB530 Lecture Slides — Kalman Filtering. University of Michigan, 2026.