11 minute read

Published: March 31, 2026

In common engineering problems, the optimization problem is to minimize the residual of a parametric model with respect to some observed data points.

Preliminaries

Given $N$ measurements $\left\{\left(\mathbf{x}_i, \mathbf{y}_i\right)\right\}_{i=1}^N$ where $\mathbf{x}_i\in\mathbb{R}^p$ and $\mathbf{y}_i\in\mathbb{R}^{m_i}$ . We define a prediction model parametrized by $\boldsymbol{\theta}\in\mathbb{R}^n$ as $\mathbf{h}\left(\cdot;\boldsymbol{\theta}\right):\mathbb{R}^p\to\mathbb{R}^{m_i}$ and the residual of a single datapoint as $\mathbf{r}_i:\mathbb{R}^n\to\mathbb{R}^{m_i}$ , where $\mathbf{r}_i\left(\boldsymbol{\theta}\right)\coloneqq \mathbf{h}\left(\mathbf{x}_i;\boldsymbol{\theta}\right)-\mathbf{y}_i$ . The full residual $\mathbf{r}:\mathbb{R}^{n}\to\mathbb{R}^m$ is simply a stack of every single residuals, i.e.,

\begin{align*} \mathbf{r}\left(\boldsymbol{\theta}\right)= \begin{bmatrix} \mathbf{r}_1\left(\boldsymbol{\theta}\right)\\ \mathbf{r}_2\left(\boldsymbol{\theta}\right)\\ \vdots \\ \mathbf{r}_N\left(\boldsymbol{\theta}\right)\\ \end{bmatrix} \in\mathbb{R}^m, && m=\sum_{i=1}^{N}m_i \end{align*}

Note that we require 𝑚≥𝑛 because otherwise the prediction model cannot be fully defined, i.e., there are infinitely many 𝜃 that perfectly fit the data.

The goal of an optimization problem is to find a model that best fits the data. To formalize this, we need to define some function $f:\mathbb{R}^n\to\mathbb{R}_{\geq 0}$ that properly maps the residual vector to a minimizable scalar, which is known as the objective function. In general, the function $f$ is assumed to be continuously differentiable. Some of the common objective functions are

Least Squares

f(\boldsymbol{\theta}) = \frac{1}{2}\|\mathbf{r}(\boldsymbol{\theta})\|^2 = \frac{1}{2}\sum_{i=1}^{N}\|\mathbf{r}_i(\boldsymbol{\theta})\|^2

Least Squares with $\ell_1$ regularization

f(\boldsymbol{\theta}) = \frac{1}{2}\|\mathbf{r}(\boldsymbol{\theta})\|^2 + \lambda\|\boldsymbol{\theta}\|_1 = \frac{1}{2}\sum_{i=1}^{N}\|\mathbf{r}_i(\boldsymbol{\theta})\|^2 + \lambda\sum_{j=1}^{n}|\theta_j|

Least Squares with $\ell_2$ regularization

f(\boldsymbol{\theta}) = \frac{1}{2}\|\mathbf{r}(\boldsymbol{\theta})\|^2 + \frac{\lambda}{2}\|\boldsymbol{\theta}\|^2= \frac{1}{2}\sum_{i=1}^{N}\|\mathbf{r}_i(\boldsymbol{\theta})\|^2 + \frac{\lambda}{2}\sum_{j=1}^{n}\theta_j^2

Weighted Least Squares

f(\boldsymbol{\theta}) = \frac{1}{2}\mathbf{r}(\boldsymbol{\theta})^\top \mathbf{W}\mathbf{r}(\boldsymbol{\theta}) = \frac{1}{2}\sum_{i=1}^{N} \mathbf{r}_i(\boldsymbol{\theta})^\top w_i\mathbf{r}_i(\boldsymbol{\theta})

Cauchy Loss

\begin{align*} f(\boldsymbol{\theta}) = \sum_{i=1}^{N} \rho\!\left(\|\mathbf{r}_i (\boldsymbol{\theta})\|\right), && \rho(r) = \frac{\alpha^2}{2}\log\!\left(1 + \frac{r^2}{\alpha^2}\right) \end{align*}

The following is a list of some of the theorems related to optimality conditions. Suppose $f$ is differentiable, and define the gradient (first-order derivative) and Hessian (second-order derivative) as

\begin{align*} \nabla f(\boldsymbol{\theta}) \coloneqq \begin{bmatrix} \dfrac{\partial f}{\partial \theta_1} \\[10pt] \vdots \\[6pt] \dfrac{\partial f}{\partial \theta_n} \end{bmatrix}\in\mathbb{R}^n &&\text{and}&& \mathbf{H}(\boldsymbol{\theta}) \coloneqq \begin{bmatrix} \dfrac{\partial^2 f}{\partial \theta_1^2} & \dfrac{\partial^2 f}{\partial \theta_1 \partial \theta_2} & \cdots & \dfrac{\partial^2 f}{\partial \theta_1 \partial \theta_n} \\[12pt] \dfrac{\partial^2 f}{\partial \theta_2 \partial \theta_1} & \dfrac{\partial^2 f}{\partial \theta_2^2} & \cdots & \dfrac{\partial^2 f}{\partial \theta_2 \partial \theta_n} \\[12pt] \vdots & \vdots & \ddots & \vdots \\[6pt] \dfrac{\partial^2 f}{\partial \theta_n \partial \theta_1} & \dfrac{\partial^2 f}{\partial \theta_n \partial \theta_2} & \cdots & \dfrac{\partial^2 f}{\partial \theta_n^2} \end{bmatrix}\in\mathbb{S}^n \end{align*}

Descent direction: If there is a vector $\mathbf{d}\in\mathbb{R}^n$ such that $\nabla f(\boldsymbol{\theta}_0)^\top\mathbf{d}<0$ , then for all sufficiently small $\lambda>0$ , $f(\boldsymbol{\theta}_0+\lambda\mathbf{d})<f(\boldsymbol{\theta}_0)$ . Here, $\mathbf{d}$ is a descent direction of $f$ at $\boldsymbol{\theta}_0$ .
Intuitively, the dot product less than 0 means that vector $\mathbf{d}$ is in the opposite direction as the gradient.

First-order necessary condition: If $\boldsymbol{\theta}^*$ is a local minimum, then $\nabla f(\boldsymbol{\theta}^*)=0$ .

Second-order necessary condition: If $\boldsymbol{\theta}^*$ is a local minimum, then $\nabla f(\boldsymbol{\theta}^*)=0$ and $\mathbf{H}(\boldsymbol{\theta}^*)\succeq 0$ (positive semi-definite).

Second-order sufficient condition: If $\nabla f(\boldsymbol{\theta}^*)=0$ and $\mathbf{H}(\boldsymbol{\theta}^*)\succ 0$ (positive definite), then $\boldsymbol{\theta}^*$ is a local minimum.

Convex function: $f$ is convex if and only if $\mathbf{H}(\boldsymbol{\theta})\succeq 0$ for all $\boldsymbol{\theta}$ .

Global minimum: Suppose $f$ is convex, then $\boldsymbol{\theta}^\star$ is a global minimum if an only if $\nabla f(\boldsymbol{\theta}^\star)=0$ .

Linear Least Squares

Let $\mathbf{A}\in\mathbb{R}^{m\times n}$ be a matrix and $\mathbf{b}\in\mathbb{R}^m$ be a vector, where $m>n$ and $\mathrm{rank}(\mathbf{A})=n$ . The Linear Least Squares (LLS) optimization problem is to find the optimal solution $\mathbf{x}\in\mathbb{R}^n$ to the linear system of equations $\mathbf{A}\mathbf{x}=\mathbf{b}$ . Define the objective function as

f(\mathbf{x})=\frac{1}{2}\left\|\mathbf{Ax-b}\right\|^2

and the goal is to find $\mathbf{x}^{\star}=\argmin_{\mathbf{x}\in\mathbb{R}^n}f(\mathbf{x})$ .

First, we can write out the gradient and Hessian matrix of $f$ , which are

\begin{align*} \nabla f(\mathbf{x})=\mathbf{A^\top A}\mathbf{x}-\mathbf{A^\top b} && \mathbf{H}(\mathbf{x})=\mathbf{A^\top A} \end{align*}

Derivation
Simply expand the objective function,
$\begin{align*} f(\mathbf{x})&=\frac{1}{2}\left(\mathbf{Ax-b}\right)^\top\left(\mathbf{Ax-b}\right)\\ &=\frac{1}{2}\left(\mathbf{x}^\top\mathbf{A}^\top-\mathbf{b}^\top\right)\left(\mathbf{Ax-b}\right)\\ &=\frac{1}{2}\left(\mathbf{x^\top A^\top Ax}-\mathbf{x^\top A^\top b}-\mathbf{b^\top Ax}+\mathbf{b^\top b}\right)\\ &=\frac{1}{2}\mathbf{x^\top A^\top Ax}-\mathbf{x^\top A^\top b}+\frac{1}{2}\mathbf{b^\top b} \end{align*}$
Note that the last step is true because $b^\top Ax\in\mathbb{R}$ and $\mathbf{b^\top Ax=(b^\top Ax)^\top=x^\top A^\top b}$ .

Further notice that because of $\mathrm{rank}(\mathbf{A})=n$ , the Hessian matrix is positive definite (and thus invertible), i.e., $\mathbf{A^\top A}\succ 0$ . Then $\mathbf{x}^\star$ is the global minimum if and only if $\nabla f(\mathbf{x}^\star)=0$ ,

\begin{align*} \mathbf{A^\top A x}^\star-\mathbf{A^\top b}=0 && \implies &&\boxed{\mathbf{x^\star}=\left(\mathbf{A^\top A}\right)^{-1}\mathbf{A^\top b}} \end{align*}

Although the above equation gives a closed-form formula for finding the global minimum $\mathbf{x}^\star$ , computing the inverse of $\mathbf{A^\top A}$ is not practical. Instead, we solve a linear system of equations

\begin{equation} \left(\mathbf{A^\top A}\right)\mathbf{x}^\star = \mathbf{A^\top b} \end{equation}

which is known as the normal equations.

Therefore, solving an LLS problem is basically solving a linear system of equations. Typical methods include Cholesky decomposition and QR factorization, rather than directly inverting $\mathbf{A^\top A}$ .

Cholesky Solver

The Cholesky decomposition of a positive-definite matrix $\mathbf{A}\in\mathbb{R}^{n\times n}$ is a decomposition into the product of an upper triangular matrix and a lower triangular matrix, i.e., $\mathbf{A}=\mathbf{LL^\top}$ , where

\begin{align*} \mathbf{A}=\begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\a_{21} & a_{22} & \cdots & a_{2n}\\\vdots & \vdots & \ddots & \vdots\\a_{n1} & a_{n2} & \cdots & a_{nn}\end{bmatrix} &&\mathbf{L}=\begin{bmatrix}l_{11} & 0 & \cdots & 0\\l_{21} & l_{22} & \cdots & 0\\\vdots & \vdots & \ddots & \vdots\\l_{n1} & l_{n2} & \cdots & l_{nn}\end{bmatrix} &&\mathbf{L^\top}=\begin{bmatrix}l_{11} & l_{21} & \cdots & l_{n1}\\0 & l_{22} & \cdots & l_{n2}\\\vdots & \vdots & \ddots & \vdots\\0 & 0 & \cdots & l_{nn}\end{bmatrix} \end{align*}

The elements of $\mathbf{L}$ can be obtained as follows

\begin{align*}l_{ii} &= \sqrt{a_{ii}-\sum_{k=1}^{i-1} l_{ik}^2}, \qquad i=1,\dots,n\\l_{ji} &= \frac{1}{l_{ii}}\left(a_{ji}-\sum_{k=1}^{i-1} l_{jk}l_{ik}\right), \qquad j=i+1,\dots,n\end{align*}

For a linear system of equations, $\mathbf{Ax}=\mathbf{b}$ , apply Cholesky decomposition and get $\mathbf{LL^\top x=b}$ . Then, we first let $\mathbf{y}=\mathbf{L^\top x}$ , and solve $\mathbf{Ly=b}$ for $\mathbf{y}$ using forward substitution ( $y_1\to y_2\to\cdots \to y_n$ ). Specifically,

\begin{align*} \mathbf{Ly=b} && \implies && \begin{bmatrix}l_{11} & 0 & \cdots & 0\\l_{21} & l_{22} & \cdots & 0\\\vdots & \vdots & \ddots & \vdots\\l_{n1} & l_{n2} & \cdots & l_{nn}\end{bmatrix} \begin{bmatrix} y_1\\y_2\\ \vdots \\y_n \end{bmatrix}= \begin{bmatrix} b_1\\ b_2\\ \vdots \\b_n \end{bmatrix} \end{align*}

Note that $y_1$ can be easily found as $y_1 = b_1/l_{11}$ , and substituting $y_1$ into the second equation solves $y_2$ . In general,

\begin{align*} y_i &= \frac{1}{l_{ii}}\left(b_i - \sum_{j=1}^{i-1} l_{ij}y_j\right),\qquad i=1,\dots,n \end{align*}

Finally, solve $\mathbf{L^\top x}=\mathbf{y}$ for $\mathbf{x}$ using backward substitution ( $x_n\to x_{n-1}\to\cdots\to x_1$ ). Specifically,

\begin{align*} \mathbf{L^\top x=y} && \implies && \begin{bmatrix} l_{11} & l_{21} & \cdots & l_{n1}\\ 0 & l_{22} & \cdots & l_{n2}\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & l_{nn} \end{bmatrix} \begin{bmatrix} x_1\\x_2\\ \vdots \\x_n \end{bmatrix} = \begin{bmatrix} y_1\\ y_2\\ \vdots \\y_n \end{bmatrix} \end{align*}

Now, $x_n$ can be easily found as $x_n=y_n/l_{nn}$ , and substituting $x_n$ into the second to last equation solves $x_{n-1}$ . In general,

\begin{align*}x_i &= \frac{1}{l_{ii}}\left(y_i - \sum_{j=i+1}^{n} l_{ji}x_j\right),\qquad i=n,\dots,1\end{align*}

In practice, we can use numpy.linalg.cholesky or similar packages to perform Cholesky decomposition. The time complexity is $O(n^3)$ .

QR Solver

The QR factorization of a matrix $\mathbf{A}\in\mathbb{R}^{n\times n}$ is a decomposition into an orthogonal matrix and an upper triangular matrix, i.e., $\mathbf{A}=\mathbf{QR}$ , where

\begin{align*} \mathbf{Q}=[q_{ij}], \,\,\mathbf{QQ^\top}=\mathbf{I}_n && \mathbf{R}=\begin{bmatrix}r_{11} & r_{21} & \cdots & r_{n1}\\0 & r_{22} & \cdots & r_{n2}\\\vdots & \vdots & \ddots & \vdots\\0 & 0 & \cdots & r_{nn}\end{bmatrix} \end{align*}

where the element of $\mathbf{Q}$ can be found via Gram-Schmidt diagonalization.

For a linear system of equations, $\mathbf{Ax}=\mathbf{b}$ , apply QR factorization and get $\mathbf{QRx=b}$ . Multiply both sides by $\mathbf{Q}^\top$ and get $\mathbf{Rx}=\mathbf{Q^\top b}$ . Let $\mathbf{y}=\mathbf{Q^\top b}$ , we can solve $\mathbf{Rx=y}$ for $\mathbf{x}$ by backward substitution, i.e.,

\begin{align*}x_i &= \frac{1}{r_{ii}}\left(y_i - \sum_{j=i+1}^{n} r_{ij}x_j\right), \qquad i=n-1,\dots,1\end{align*}

Non-linear Least Squares

In general, let $\mathbf{r}:\mathbb{R}^n\to \mathbb{R}^m,m\geq n$ be a smooth residual function. We can define an objective function $f:\mathbb{R}^n\to\mathbb{R}$ as

f(\mathbf{x})=\frac{1}{2}\left\|\mathbf{r}(\mathbf{x})\right\|^2=\sum_{i=1}^{m}r_i^2(\mathbf{x})

The goal is still to find the minimizer of the function $f$ , i.e., $\mathbf{x}^{\star}=\argmin_{\mathbf{x}\in\mathbb{R}^n}f(\mathbf{x})$ . Unfortunately, there’s no closed-form formula for finding $\mathbf{x}^\star$ as we did for the LLS problem. Common methods usually start from an initial guess, and iteratively approximate the optimum based on “descent direction” mentioned before.

Specifically, if we start with an initial guess $\mathbf{x}^0$ , linearize around $\mathbf{x}^0$ , and find a descending direction $\mathbf{d}$ . Then, we update the guess by moving along direction $\mathbf{d}$ , and we can expect the objective will become smaller, i.e., $\mathbf{x}^{k+1 }=\mathbf{x}^k+\mathbf{d}$ and $f(\mathbf{x}^{k+1})<f(\mathbf{x}^k)$ . Hopefully, this process will converge to a local minimum. Common methods include Gauss-Newton, Levenberg-Marquardt, or gradient descent, distinguished by how they compute $\mathbf{d}$ .

Gauss-Newton Method

The steps of the Gauss-Newton method are as follows:

Start from an initial guess $\mathbf{x}^0$ , iterate until convergence $k=0, 1, 2, \dots$

Linearize the residual at the current guess $\mathbf{x}^k$ using first-order Taylor expansion,
$\mathbf{r}(\mathbf{x}^k+\mathbf{d})=\mathbf{r}(\mathbf{x}^k)+\mathbf{J}(\mathbf{x}^k)\mathbf{d}$
where
$\begin{equation*} \mathbf{J}(\mathbf{x}) := \frac{\partial \mathbf{r(x)}}{\partial \mathbf{x}} = \begin{bmatrix} \dfrac{\partial r_1}{\partial x_1} & \dfrac{\partial r_1}{\partial x_2} & \cdots & \dfrac{\partial r_1}{\partial x_n}\\[12pt] \dfrac{\partial r_2}{\partial x_1} & \dfrac{\partial r_2}{\partial x_2} & \cdots & \dfrac{\partial r_2}{\partial x_n} \\[6pt] \vdots & \vdots & \ddots & \vdots \\[6pt] \dfrac{\partial r_m}{\partial x_1} & \dfrac{\partial r_m}{\partial x_2} & \cdots & \dfrac{\partial r_m}{\partial x_n} \end{bmatrix} \in \mathbb{R}^{m \times n} \end{equation*}$
is the Jacobian matrix. Let $\mathbf{J}_k\coloneqq \mathbf{J}(\mathbf{x}^k)$ .

Solve the resulting linear least squares to find the step $\mathbf{d}^{\star}=\argmin_\mathbf{d}\left\|\mathbf{r}(\mathbf{x}^k)+\mathbf{J}_k\mathbf{d}\right\|^2$ by solving
$\left(\mathbf{J}_k^\top \mathbf{J}_k\right) \mathbf{d}=-\mathbf{J}_k^\top \mathbf{r}\left(\mathbf{x}^k\right)$
As discussed before, this is a linear system of equations, and should be solved by using Cholesky or QR decomposition rather than inverting $\mathbf{J}_k^\top\mathbf{J}_k$ .

Update the current guess, $\mathbf{x}^{k+1}=\mathbf{x}^k+\mathbf{d}$ .

Implementation example:

def gauss_newton(x0, max_iter, tol):
    x = x0.copy()
    for _ in range(max_iter):
        r = residual(x)
        J = jacobian(x)
        d = np.linalg.solve(J.T @ J, -J.T @ r) # Solve (J^T J) d = -J^T r
        x = x + d
        if np.linalg.norm(d) < tol:
            break
    return x

Gradient Descent

The steps of the gradient descent method are as follows:

Start from an initial guess $\mathbf{x}^0$ , iterate until convergence $k=0, 1, 2, \dots$

Compute the gradient of the objective function
$\nabla f(\mathbf{x}^k)=\mathbf{J}_k^\top \mathbf{r}(\mathbf{x}^k)$
- Derivation
  Note that
  $f(\mathbf{x})=\frac{1}{2}\left\|\mathbf{r}(\mathbf{x})\right\|^2=\frac{1}{2}\sum_{i=1}^{m}r_i^2\left(\mathbf{x}\right)^2$
  and the $j$ -th entry of $\nabla f$ is
  $\frac{\partial f}{\partial x_j}=\sum_{i=1}^{m}r_i\left(\mathbf{x}\right)\frac{\partial r_i}{\partial x_j}=\mathbf{J}_{k,j}^\top\mathbf{r}(\mathbf{x})$
  Then, stack all the entries of $\nabla f$ and we can get the result $\nabla f(\mathbf{x}^k)=\mathbf{J}_k^\top \mathbf{r}(\mathbf{x}^k)$ .

Let the step $\mathbf{d}$ be the negative of the gradient, scaled by a factor called the learning rate $\eta$ , i.e., $\mathbf{d}=-\eta \mathbf{J}_k^\top \mathbf{r}(\mathbf{x}^k)$ .

Update the current guess, $\mathbf{x}^{k+1}=\mathbf{x}^k+\mathbf{d}$ .

Implementation example:

def gradient_descent(x0, max_iter, tol, eta):
    x = x0.copy()
    for _ in range(max_iter):
        r = residual(x)
        J = jacobian(x)
        d = -eta * J.T @ r # Directly set d to be along the opposite gradient
        x = x + d
        if np.linalg.norm(d) < tol:
            break
    return x

Levenberg-Marquardt Method

The step of the Levenberg-Marquardt method is as follows:

Start from an initial guess $\mathbf{x}^0$ , iterate until convergence $k=0, 1, 2, \dots$

Linearize the residual at the current guess $\mathbf{r}\left(\mathbf{x}^k+\mathbf{d}\right)=\mathbf{r}\left(\mathbf{x}^k\right)+\mathbf{J}_k\mathbf{d}$ .

Solve a $\ell_2$ regularized linear least squares problem by $\mathbf{d}^\star=\argmin_{\mathbf{d}}\left\|\mathbf{r}\left(\mathbf{x}^k\right)+\mathbf{J}_k\mathbf{d}\right\|^2+\lambda \left\|\mathbf{d}\right\|^2$ .
$\left(\mathbf{J}_k^\top \mathbf{J}_k+\lambda \mathbf{I}\right) \mathbf{d}=-\mathbf{J}_k^\top \mathbf{r}\left(\mathbf{x}^k\right)$
- Derivation
  Denote $\mathbf{r}_k\coloneqq \mathbf{r}\left(\mathbf{x}^k\right)$ , the new objective function is
  $\begin{align*} g(\mathbf{d}) &= (\mathbf{r}_k + \mathbf{J}_k \mathbf{d})^\top(\mathbf{r}_k + \mathbf{J}_k \mathbf{d}) + \lambda\,\mathbf{d}^\top\mathbf{d}\\ &= \mathbf{r}_k^\top\mathbf{r}_k + \mathbf{r}_k^\top\mathbf{J}_k\mathbf{d} + \mathbf{d}^\top\mathbf{J}_k^\top\mathbf{r}_k + \mathbf{d}^\top\mathbf{J}_k^\top\mathbf{J}_k\mathbf{d} + \lambda\mathbf{d}^\top\mathbf{d}\\ &= \mathbf{r}_k^\top\mathbf{r}_k + 2\,\mathbf{r}_k^\top\mathbf{J}_k\mathbf{d} + \mathbf{d}^\top\mathbf{J}_k^\top\mathbf{J}_k\mathbf{d} + \lambda\,\mathbf{d}^\top\mathbf{d} \end{align*}$
  Take the gradient of $g$ , and let $\nabla g(\mathbf{d})=\mathbf{0}$
  $\begin{align*} \nabla g(\mathbf{d}) = 2\,\mathbf{J}_k^\top\mathbf{r}_k + 2\,\mathbf{J}_k^\top\mathbf{J}_k\,\mathbf{d} + 2\lambda\,\mathbf{d} = \mathbf{0} && \implies && \boxed{(\mathbf{J}_k^\top\mathbf{J}_k + \lambda\mathbf{I})\mathbf{d} = -\mathbf{J}_k^\top\mathbf{r}(\mathbf{x}^k)} \end{align*}$

Update the current guess, $\mathbf{x}^{k+1}=\mathbf{x}^k+\mathbf{d}$ .

Implementation example:

def levenberg_marquardt(x0, max_iter, tol, lam):
    x = x0.copy()
    for _ in range(max_iter):
        r = residual(x)
        J = jacobian(x)
        A = J.T @ J + lam * np.eye(len(x))
        d = np.linalg.solve(A, -J.T @ r) # Solve (J^T J + λI) d = -J^T r
        x = x + d
        if np.linalg.norm(d) < tol:
            break
    return x

Note that the Levenberg-Marquardt method is an interpolation between Gauss-Newton and gradient descent.

As $\lambda\to 0$ , it reduces to Gauss-Newton: $\left(\mathbf{J}_k^\top \mathbf{J}_k+\lambda \mathbf{I}\right) \to \mathbf{J}_k^\top \mathbf{J}_k$ .

As $\lambda \to \infty$ , it reduces to gradient descent: $\left(\mathbf{J}_k^\top \mathbf{J}_k+\lambda \mathbf{I}\right) \to \lambda \mathbf{I}$ .

Example

Now we compare these three methods using an example. Consider the residual and objective as

\begin{align*} \mathbf{r}(\mathbf{x})=\begin{bmatrix} 1-x_1\\ 10(x_2-x_1^2) \end{bmatrix}, && f(\mathbf{x})=\frac{1}{2}\left[(1-x_1)^2+100(x_2-x_1^2)^2\right]

Occupancy Grid Mapping

9 minute read

Published: March 22, 2026

Besides state estimation or localization, which provide a robot with knowledge of where it is, it’s equally important for a mobile robot to perceive its surrounding and form a knowledge of where is occupied, that’s where mapping comes in. This note covers a fundamental mapping algorithm, occupancy grid mapping.

Problem Statement

In occupancy grid mapping, a map $\mathbf{m}=\left\{m_i\right\}_{i=1}^N$ is discretized into $N$ different cells, where each cell $m_i$ is a binary random variable that models the occupancy, and $p(m_i)$ is the probability of a cell being occupied. The map is assumed to be static. The goal is to find the belief $\mathrm{bel}_t(\mathbf{m})=p(\mathbf{m}\mid \mathbf{z}_{1:t}, \mathbf{x}_{1:t})$ , where $\mathbf{z}$ is the observation and $\mathbf{x}$ is the state of the robot.

Two assumptions are made for tractability

The robot state $\mathbf{x}_t$ are known.

The occupancy of individual cells $m_i$ are independent, and thus

\begin{equation} \mathrm{bel}_t(\mathbf{m})=p(\mathbf{m}\mid \mathbf{z}_{1:t}, \mathbf{x}_{1:t})=\prod_{i=1}^{N}p(m_i\mid \mathbf{z}_{1:t}, \mathbf{x}_{1:t}) \end{equation}

At each timestep $t$ , the sensor performs some measurement and provide the information of whether a cell $m_i$ is believed to be occupied or not, in terms of $p(m_i\mid\mathbf{z}_t, \mathbf{x}_t)$ , which is known as the inverse sensor model.

Inverse sensor model of a laser beam range sensor, image taken from [1]

One simple inverse sensor model is shown in the figure above. Consider a laser beam range sensor that provides the reading $\mathbf{z}=(z_1^r,\dots, z_n^r,z_{1}^{\phi},\dots,z_{n}^{\phi})$ , where $z_k^r$ and $z_k^\phi$ are the range and bearing of the $k$ -th beam. For grid $i$ , let $r_i$ and $\phi_i$ be the range and bearing of the grid as observed from the current robot pose $\mathbf{x}$ , and $k=\argmin_j\left|\phi_i-z_j^{\phi}\right|$ be the beam going through that gird. Based on the relation of a grid’s position relative to the robot, we have three different cases:

If $r_i>\min(z_{\max},z_k^r+w_{\mathrm{grid}})$ or $\left|\phi_i-z_{k}^{\phi}\right|>w_{\mathrm{beam}} / 2$ , then return $p_{\mathrm{previous}}$ , the sensor has no information about this grid.

Else if $z_k^r < z_{\max}$ and $\left|r_i-z_k^r\right|<w_{\mathrm{grid}}$ , then return $p_{\mathrm{occupied}}$ , the sensor think the grid is occupied.

Else if $z_k^r<z_{\max}$ and $r_i < z_k^r$ , then return $p_{\mathrm{free}}$ , the sensor think the grid is free.

where $z_{\max}$ is a sensor-specific parameter representing the maximum detectable range, $w_{\mathrm{grid}}$ is the grid size and $w_{\mathrm{beam}}$ is the width of each beam.

Mapping Algorithm

Following the Bayes filtering framework, a general algorithm can be derived for the occupancy grid mapping [2]. Starting from equation 1,

\begin{align*} p(m_i\mid \mathbf{z}_{1:t}, \mathbf{x}_{1:t})&=\frac{p(\mathbf{z}_t\mid m_i, \mathbf{z}_{1:t-1},\mathbf{x}_{1:t})p(m_i\mid\mathbf{z}_{1:t-1}, x_{1:t})}{p(\mathbf{z}_t\mid\mathbf{z}_{1:t-1},\mathbf{x}_{1:t})} && \text{(Bayes rule)}\\ &=\frac{p(\mathbf{z}_t\mid m_i, \mathbf{x}_t)p(m_i\mid\mathbf{z}_{1:t-1}, \mathbf{x}_{1:t-1})}{p(\mathbf{z}_t\mid\mathbf{z}_{1:t-1},\mathbf{x}_{1:t})} && \text{(Markov assumption)}\\ &=\frac{p(m_i \mid \mathbf{z}_t, \mathbf{x}_t) \, p(\mathbf{z}_t \mid \mathbf{x}_t) \, p(m_i \mid \mathbf{z}_{1:t-1}, \mathbf{x}_{1:t-1})}{p(m_i \mid \mathbf{x}_t) \, p(\mathbf{z}_t \mid \mathbf{z}_{1:t-1}, \mathbf{x}_{1:t})} && \text{(Bayes rule)} \end{align*}

Further notice that the map $\mathbf{m}$ is independent of the robot state $\mathbf{x}_t$ , we obtain

\begin{equation} p(m_i\mid \mathbf{z}_{1:t}, \mathbf{x}_{1:t})=\frac{p(m_i \mid \mathbf{z}_t, \mathbf{x}_t) \, p(\mathbf{z}_t \mid \mathbf{x}_t) \, p(m_i \mid \mathbf{z}_{1:t-1}, \mathbf{x}_{1:t-1})}{p(m_i) \, p(\mathbf{z}_t \mid \mathbf{z}_{1:t-1}, \mathbf{x}_{1:t})} \end{equation}

Similarly, for the negation

\begin{equation} p(\lnot m_i\mid \mathbf{z}_{1:t}, \mathbf{x}_{1:t})= \frac{p(\lnot m_i \mid \mathbf{z}_t, \mathbf{x}_t) \, p(\mathbf{z}_t \mid \mathbf{x}_t) \, p(\lnot m_i \mid \mathbf{z}_{1:t-1}, \mathbf{x}_{1:t-1})}{p(\lnot m_i) \, p(\mathbf{z}_t \mid \mathbf{z}_{1:t-1}, \mathbf{x}_{1:t})} \end{equation}

Compute the ratio of equation (2) and (3), we get

\begin{align*} \frac{p(m_i \mid \mathbf{z}_{1:t}, \mathbf{x}_{1:t})}{1 - p(m_i \mid \mathbf{z}_{1:t}, \mathbf{x}_{1:t})}&= \frac{p(m_i \mid \mathbf{z}_t, \mathbf{x}_t) \, p(m_i \mid \mathbf{z}_{1:t-1}, \mathbf{x}_{1:t-1}) \, p(\neg m_i)}{p(\neg m_i \mid \mathbf{z}_t, \mathbf{x}_t) \, p(\neg m_i \mid \mathbf{z}_{1:t-1}, \mathbf{x}_{1:t-1}) \, p(m_i)}\\ &= \underbrace{\frac{p(m_i \mid \mathbf{z}_t, \mathbf{x}_t)}{1 - p(m_i \mid \mathbf{z}_t, \mathbf{x}_t)}}_{\text{inverse sensor model}} \underbrace{\frac{p(m_i \mid \mathbf{z}_{1:t-1}, \mathbf{x}_{1:t-1})}{1 - p(m_i \mid \mathbf{z}_{1:t-1}, \mathbf{x}_{1:t-1})}}_{\text{recursive term}} \underbrace{\frac{1 - p(m_i)}{p(m_i)}}_{\text{prior}} \end{align*}

Then, change to log odds form by defining

l(x)\triangleq\log\frac{p(x)}{1-p(x)}

and the product chain turns into a sum

l(m_i\mid\mathbf{z}_{1:t}, \mathbf{x}_{1:t})=l(m_i\mid \mathbf{z}_{t},\mathbf{x}_t)+l(m_i\mid\mathbf{z}_{1:t-1}, \mathbf{x}_{1:t-1})-l(m_i)

Bayes Inference using Conjugate Prior

The previous section derived an equation for updating the belief of the map under the Bayes framework, however, it lacks the notion of uncertainty (i.e., how confident we are about the map). With the notion of conjugate prior, a proper selection of the underlying distribution function for $p(m_i)$ can both simplifies the updating rule and provide a closed form formula for uncertainty.

Recall that from Bayes rule

\begin{equation*} \underbrace{p(m_i\mid \mathbf{z}_{1:t}, \mathbf{x}_{1:t})}_{\text{posterior}}\propto\underbrace{p(\mathbf{z}_t\mid m_i, \mathbf{x}_t)}_{\text{likelihood}}\cdot\underbrace{p(m_i\mid\mathbf{z}_{1:t-1}, \mathbf{x}_{1:t-1})}_{\text{prior}} \end{equation*}

Bernoulli Likelihood and Beta Prior

First consider the case where the occupancy of a grid is binary, i.e., $m_i\in\{0, 1\}$ . Let $\theta_i\triangleq p(m_i=1)$ be the unknown occupancy probability. From the inverse sensor model described above, we know that each sensor measurement provide a binary result, denoted as $y_t$ , where

\begin{equation*} y_t=\left\{\begin{matrix} 1, & \text{if the inverse sensor model return }p_{\text{occupied}}\\ 0, & \text{if the inverse sensor model return }p_{\text{free}} \end{matrix}\right. \end{equation*}

Then, the likelihood is modeled as a Bernoulli with parameter $\theta_i$ , i.e.,

p(\mathbf{z}_t\mid m_i, \mathbf{x}_t)\propto p(y_t\mid \theta_i)=\theta_i^{y_t}\left(1-\theta_i\right)^{1-y_t}

Furthermore, if we model the prior as a Beta distribution, i.e.,

p(\theta_i\mid \mathbf{z}_{1:t-1},\mathbf{x}_{1:t-1})=\mathrm{Beta}(\theta_i;\alpha_{i,t-1},\beta_{i,t-1})=\frac{1}{B(\alpha_{i,t-1}, \beta_{i,t-1})}\theta_i^{\alpha_{i,t-1} -1}\left(1-\theta_i\right)^{\beta_{i,t-1} - 1}

where $B(\cdot, \cdot)$ is the beta function.

Then, it follows from the Bayes rule that the posterior is

\begin{align*} p(\theta_i\mid \mathbf{z}_{1:t}, \mathbf{x}_{1:t})&\propto p(y_t\mid \theta_i)\cdot p(\theta_i\mid\mathbf{z}_{1:t-1}, \mathbf{x}_{1:t-1})\\ &\propto\theta_i^{y_t}\left(1-\theta_i\right)^{1-y_t}\cdot \theta_i^{\alpha_{i,t-1} -1}\left(1-\theta_i\right)^{\beta_{i,t-1} - 1}\\ &=\theta_i^{(\alpha_{i,t-1}+y_t)-1}\left(1-\theta_i\right)^{(\beta_{i,t-1}+1-y_t)-1} \end{align*}

which still follows a Beta distribution, $p(\theta_i\mid \mathbf{z}_{1:t},\mathbf{x}_{1:t})=\mathrm{Beta}(\theta_i;\alpha_t, \beta_t)$ , where

\begin{align} \alpha_{i,t}=\alpha_{i,t-1}+y_t &&\text{and}&& \beta_{i,t} = \beta_{i,t-1}+(1-y_t) \end{align}

In this case, we call beta distribution the conjugate prior for the Bernoulli likelihood.

Conjugate Distribution and Conjugate Prior [2]
- If the posterior distributions are in the same probability distribution family as the prior probability distribution, the prior and posterior are then called conjugate distributions.
- The prior is called a conjugate prior for the likelihood function.

Note that the choice of modeling the prior as a Beta distribution here is an algebraic convenience, which provides a closed-form expression for the posterior. Then the mean and covariance can be easily obtained

\begin{align*} \mathbb{E}[\theta_i]=\frac{\alpha_i}{\alpha_i + \beta_i} && \text{and} && \mathbb{V}[\theta_i]=\frac{\alpha_i\beta_i}{\left(\alpha_i+\beta_i\right)^2\left(\alpha_i+\beta_i+1\right)}

Lie Theory in Robot Motion

7 minute read

Published: March 10, 2026

This note covers some of the fundamental concepts in Lie group and Lie algebra, and their applications to representing rigid body motion in robotics.

Matrix Lie Groups

A group is a set $G$ equipped with a binary map $\cdot:G\times G\to G$ satisfying the following properties:

Closure: $\forall a,b\in G, a\cdot b\in G$ .

Associativity: $\forall a, b, c\in G,\, (a\cdot b)\cdot c = a\cdot(b\cdot c)$ .

Identity: $\exist e\in G,\,\mathrm{s.t.}\,\forall a\in G, \, a\cdot e=e\cdot a=a$ .

Inverse: $\forall a\in G,\, \exist a^{-1}\in G, \, \mathrm{s.t.}\, a\cdot a^{-1}=a^{-1}\cdot a = e$ .

Here, it’s better to understand the element of a group as a kind of operation or transformation (e.g., a rotation or a rigid body transformation) rather than a number or a vector. These groups turn out to be also differentiable manifolds, which makes them Lie groups.

Special Orthogonal Group, $\mathrm{SO}(n)$

To represent a rotation formally, we need to consider the properties that define a rotation. In general, a rotation should be

Linear, so it can be represented as a matrix $\mathbf{R}$ .

Preserves length and angles, so for any vector $\mathbf{v}$ and $\mathbf{w}$ , $\mathbf{v}\cdot \mathbf{w}=(\mathbf{R}\mathbf{v})\cdot (\mathbf{R}\mathbf{w})$ .
Note, from this property,
$\mathbf{v}^{\top}\mathbf{w}=(\mathbf{R}\mathbf{v})^{\top}(\mathbf{R}\mathbf{w})=\mathbf{v}\mathbf{R}^{\top}\mathbf{R}\mathbf{w}$
which means the matrix $\mathbf{R}$ has to satisfy the following
$\begin{equation} \mathbf{R}^{\top}\mathbf{R}=\mathbf{I} \end{equation}$

Preserves orientation (no mirroring), so $\det(\mathbf{R})>0$ .
Given the previous property
$\det(\mathbf{R}^{\top}\mathbf{R})=\det(\mathbf{R}^{\top})\cdot\det(\mathbf{R})=(\det(\mathbf{R}))^2=1$
we have
$\det(\mathbf{R})=1$

In fact, these properties give rise to a group called the special orthogonal group,

\mathrm{SO}(n)=\{\mathbf{R}\in\mathbb{R}^{n\times n}\mid\mathbf{R}^{\top}\mathbf{R}=\mathbf{I},\det(\mathbf{R})=1\}

which is a group of all rotations about the origin in an $n$ dimensional Euclidean space.

Special Euclidean Group, $\mathrm{SE}(n)$

The general way to represent the motion of a rigid body is a transformation that

Preserves the Euclidean distance, $\left\|f(\mathbf{x})-f(\mathbf{y})\right\|=\left\|\mathbf{x}-\mathbf{y}\right\|$ .

Preserves the orientation.

All such transformation forms a group called the special Euclidean group, $\mathrm{SE}(n)$ .

Intuitively, such transformation can be represented by a function $f:\mathbb{R}^n\to\mathbb{R}^n$ , where

\begin{align*} f(\mathbf{x})=\mathbf{R}\mathbf{x}+\mathbf{p}, && \mathbf{R}\in\mathrm{SO}(n)\,\,\text{and}\,\,\mathbf{p}\in\mathbb{R}^n \end{align*}

However, the function $f$ is not linear and thus cannot be represented by a matrix, which causes inconvenience. The general trick to solve this problem is to augment the dimension by 1. For $\mathbf{x}\in\mathbb{R}^n$ , define $\overline{\mathbf{x}}\in\mathbb{R}^{n+1}$ and $\mathbf{T}\in\mathbb{R}^{(n+1)\times(n+1)}$ such that

\begin{align*} \overline{\mathbf{x}}=\left[\begin{matrix}\mathbf{x}\\1\end{matrix}\right], && \mathbf{T}=\left[ \begin{matrix} \mathbf{R} & \mathbf{p}\\ \mathbf{0}_{1\times n} & 1 \end{matrix} \right], && \mathbf{T}\overline{\mathbf{x}}=\overline{f(\mathbf{x})} \end{align*}

so the transformation becomes matrix vector multiplication.

The special Euclidean group now can be written as

\mathrm{SE}(n)=\left\{\mathbf{T}\in\mathbb{R}^{(n+1)\times(n+1)}\mid\mathbf{T}=\left[ \begin{matrix} \mathbf{R} & \mathbf{p}\\ \mathbf{0}_{1\times n} & 1 \end{matrix} \right], \mathbf{R}\in\mathrm{SO}(n), \,\mathbf{p}\in\mathbb{R}^n \right\}

which represents all poses an object can take in an $n$ dimensional space.

Lie Algebra

A Lie algebra is a vector space $\mathfrak{g}$ together with a binary operation called the Lie bracket $[\cdot,\cdot]:\mathfrak{g}\times\mathfrak{g}\to\mathfrak{g}$ satisfying the following properties

Bilinearity: $[a\mathbf{x}+b\mathbf{y},\mathbf{z}]=a[\mathbf{x},\mathbf{z}]+b[\mathbf{y},z]$ , $[\mathbf{z},a\mathbf{x}+b\mathbf{y}]=a[\mathbf{z},\mathbf{x}]+b[\mathbf{z},\mathbf{y}]$ .

Alternating: $[\mathbf{x},\mathbf{x}]=\mathbf{0}$ .

Jacobi identity: $[\mathbf{x},[\mathbf{y},\mathbf{z}]]+[\mathbf{y},[\mathbf{z},\mathbf{x}]]+[\mathbf{z},[\mathbf{x},\mathbf{y}]]=\mathbf{0}$ .

Importantly, every Lie algebra $\mathfrak{g}$ is associated to a Lie group $G$ , which is the tangent space of the identity of the Lie group, denoted as $\mathfrak{g} = T_{e}G$ , and it completely captures the local structure of the group. In the case of robot motion, we can think of a trajectory that lives in the Lie group while the velocity lives in the Lie algebra.

Exponential Map

Let $G$ be a Lie group and $\mathfrak{g}$ be its Lie algebra, first we need to define a few notations:

A curve in $G$ is defined as a function $\gamma:\mathbb{R}\to G$ that maps each real number $t\in\mathbb{R}$ to an element of the group $G$ .
The derivative of $\gamma$ , denoted as $\dot\gamma$ , evaluated at a specific time $t$ , $\dot\gamma(t)$ , is a vector in the tangent space. Suppose $\gamma (t)=g$ , then $\dot\gamma(t)\in T_gG$ .

Left translation: For a specific element $g\in G$ , define the map $L_g:G\to G, x\mapsto gx$ as the left translation. Example: if $G=\mathrm{SO}(3)$ , and $g=\mathbf{R}$ , then $L_{\mathbf{R}}(\mathbf{Q})=\mathbf{R}\mathbf{Q}$ .

Left-invariant vector field: For a vector $\xi\in\mathfrak{g}$ , the left-invariant vector field $X_{\xi}$ is defined at any point $g\in G$ by $X_{\xi}(g)=(\mathrm{d}L_g)_e(\xi)$ .
Basically, start from a velocity vector $\xi\in\mathfrak{g}$ , which has to stars at the origin $e$ (identity). Then, you map this velocity vector to another vector in the tangent space to a group element $g$ , i.e., $T_g G$ .

Now, suppose you have a curve $\gamma_\xi$ that starts at the identity with velocity $\xi$ , then, for any $t\in\mathbb{R}$ , we have

\frac{\mathrm{d}}{\mathrm{d}t}\gamma_\xi(t)=\dot\gamma_\xi(t)=X_{\xi}(\gamma_\xi(t))

The above equation can be thought of as a “differential equation” whose solution is

\gamma_\xi(t)=\exp(t\xi)

Let $t=1$ , the exponential map can be defined as

\begin{align} \exp:\mathfrak{g}\to G,&& \exp(\xi)=\gamma_\xi(1)

Bayes Filtering and State Estimation

7 minute read

Published: March 04, 2026

In robot state estimation, the Bayes filter is a probabilistic approach that estimates the state from a sequence of controls and measurements by recursively performing prediction with a motion model and correction with a measurement model. In practice, some common instantiations of the Bayes framework are the Kalman filter, information filter, and particle filter.

Mathematical Formulation of Bayes Filter

Given a sequence of measurements $\mathrm{z}_{1:t}=\mathbf{z}_1,\dots,\mathbf{z}_t$ and controls $\mathbf{u}_{1:t}=\mathbf{u}_1,\dots,\mathbf{u}_t$ , and a measurement model $p(\mathbf{z}_t\mid\mathbf{x}_t)$ and a motion model $p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{u}_t)$ . The Bayes filter provides a recursive framework for estimating the state $\mathbf{x}_t$ in terms of a probability distribution function, known as the belief $\mathrm{bel}(\mathbf{x}_t)\triangleq p(\mathbf{x}_t\mid\mathbf{z}_{1:t},\mathbf{u}_{1,t})$ , or posterior.

The dynamic Bayes network that characterizes the evolution of controls, states, and measurements. [1]

The filtering algorithm is built upon two assumptions of conditional independence:

Motion model is (1st-order) Markov:

p(\mathbf{x}_t\mid\mathbf{x}_{0:t-1},\mathbf{u}_{1:t})=p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{u}_t)

The measurement model is memoryless:

p(\mathbf{z}_t\mid\mathbf{x}_{0:t},\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})=p(\mathbf{z}_{t}\mid\mathbf{x}_t)

Given these two assumptions, we can propagate the belief over time:

\begin{align*} \mathrm{bel}(\mathbf{x}_t) &=p(\mathbf{x}_t\mid\mathbf{z}_{1:t},\mathbf{u}_{1:t})=p(\mathbf{x}_t\mid\mathbf{z}_t,\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) && \to\text{definition} \\ &=\eta p(\mathbf{z}_t\mid\mathbf{x}_t,\mathbf{z}_{1:t-1}, \mathbf{u}_{1:t})p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) && \to\text{Bayes rule}\\ &=\eta p(\mathbf{z}_t\mid\mathbf{x}_t)p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) && \to \text{memoryless} \end{align*}

We can define the predicted belief of the state at time $t$ to be the probability distribution before knowing the current measurement, i.e., $\overline{\mathrm{bel}}(\mathbf{x}_t)\triangleq p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})$ . Then, the above equation becomes

\begin{equation} \mathrm{bel}(\mathrm{x}_t)=\eta p(\mathbf{z}_t\mid\mathbf{x}_t)\overline{\mathrm{bel}}(\mathbf{x}_t) \end{equation}

where $\eta$ is the normalization coefficient, and

\frac{1}{\eta}=\int p(\mathbf{z}_t\mid\mathbf{x}_t)\overline{\mathrm{bel}}(\mathbf{x}_t)\mathrm{d}\mathbf{x}_t

which is essentially a correction applied to the belief after observing the current measurement.

What remains to be done, obviously, is to predict the belief $\overline{\mathrm{bel}}(\mathbf{x}_t)$ from the previous state:

\begin{align*} \overline{\mathrm{bel}}(\mathbf{x}_t) &=p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t}) \\ &=\int p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})p(\mathbf{x}_{t-1}\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})\mathrm{d}\mathbf{x}_{t-1} && \to \text{marginalize}\,\mathbf{x}_{t-1}\\ &=\int p(\mathbf{x}_t\mid\mathbf{x}_{t-1}, \mathbf{u}_t)p(\mathbf{x}_{t-1}\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t-1})\mathrm{d}\mathbf{x}_{t-1} &&\to\text{Markov assumption} \end{align*}

Then, recognizing $p(\mathbf{x}_{t-1}\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t-1})=\mathrm{bel}(\mathbf{x}_{t-1})$ , we have

\begin{equation} \overline{\mathrm{bel}}(\mathbf{x}_t)=\int p(\mathbf{x}_t\mid\mathbf{u}_t,\mathbf{x}_{t-1})\mathrm{bel}(\mathbf{x}_{t-1})\mathrm{d}\mathbf{x}_{t-1} \end{equation}

In summary, the Bayes filter provides a two-step framework for updating the belief of the current state $\mathbf{x}_t$ from the belief of the previous state $\mathbf{x}_{t-1}$ :

Prediction using the motion model (Equation 1);

Correction using the measurement model (Equation 2).

These two steps can be performed recursively to estimate the state over time.

Kalman Filter

The Kalman filter is an instantiation of the Bayes filter with the assumption of linear Gaussian models. This means the motion model and the measurement model should be

linear with respect to their arguments:
Specifically, suppose $\mathbf{x}\in\mathbb{R}^m$ , $\mathbf{u}\in\mathbb{R}^n$ , and $\mathbf{z}\in\mathbb{R}^p$ . Let $\mathbf{F}\in \mathbb{R}^{m\times m}$ , $\mathbf{G}\in\mathbb{R}^{m\times n}$ , and $\mathbf{H}\in\mathbb{R}^{p\times m}$ be matrices (linear maps).
- Motion model:
$\begin{align} \mathbf{x}_t=\mathbf{F}_t\mathbf{x}_{t-1}+\mathbf{G}_t\mathbf{u}_t+\mathbf{w}_t, &&\mathbf{w}_t\sim\mathcal{N}(\mathbf{0},\mathbf{Q}_t) \end{align}$
- Measurement model:
$\begin{align} \mathbf{z}_t=\mathbf{H}_t\mathbf{x}_t+\mathbf{v}_t, &&\mathbf{v}_t \sim\mathcal{N}(\mathbf{0}, \mathbf{R}_t) \end{align}$
where $\mathbf{w}\in\mathbb{R}^m$ and $\mathbf{v}_t\in\mathbb{R}^p$ are white noise with variance $\mathbf{Q}\in\mathbb{R}^{m\times m}$ and $\mathbf{R}\in\mathbb{R}^{p\times p}$ , respectively.
- White noise
  A discrete-time random process (a sequence of random vectors), $\mathbf{w}_k$ , is called white noise if
  $\begin{align*} \mathbb{E}[\mathbf{w}_k\mathbf{w}_j^{\top}]=\mathbf{Q}_k\delta_{kj}, &&\delta_{kj}=\left\{\begin{matrix}1 & k=j\\ 0 & k\ne j\end{matrix}\right. \end{align*}$

following a multivariate Gaussian distribution:
Specifically,
- Motion model: $p(\mathbf{x}_t\mid\mathbf{x}_{t-1},\mathbf{u}_t)=\mathcal{N}(\mathbf{x}_t;\mathbf{F}_t\mathbf{x}_{t-1}+\mathbf{G}_t\mathbf{u}_t, \mathbf{Q}_t)$ ;
- Measurement model: $p(\mathbf{z}_t\mid\mathbf{x}_t)=\mathcal{N}(\mathbf{z}_t;\mathbf{H}_t\mathbf{x}_t,\mathbf{R}_t)$ .
Note: here we need to interpret $\mathbf{x}_{t-1}$ as a realization rather than a random variable.
- Multivariate Gaussian PDF
  $\mathcal{N}(\mathbf{x};\boldsymbol{\mu},\mathbf{\Sigma})$ is the probability distribution function of a random variable $\mathbf{x}\in\mathbb{R}^m$ that follows a Gaussian distribution with mean $\boldsymbol{\mu}\in\mathbb{R}^m$ and covariance matrix $\mathbf{\Sigma}\in\mathbb{R}^{m\times m}$ ,
  $\mathcal{N}(\mathbf{x};\boldsymbol{\mu},\mathbf{\Sigma})=p(\mathbf{x})=\frac{1}{\sqrt{(2\pi)^m\det(\mathbf{\Sigma})}}\exp\left(-\frac{1}{2}\left(\mathbf{x}-\boldsymbol{\mu}\right)^{\top}\mathbf{\Sigma}^{-1}\left(\mathbf{x}-\boldsymbol{\mu}\right)\right)$

Before deriving the algorithm for the Kalman filter, there are two important properties of the multivariate Gaussian distribution that are useful here.

Affine transformation:
Let $\mathbf{x}$ be a random vector following a Gaussian distribution $\mathbf{x}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma})$ , then the random vector $\mathbf{y}=\mathbf{A}\mathbf{x}+\mathbf{b}$ follows a Gaussian distribution $\mathbf{y}\sim\mathcal{N}(\mathbf{A}\boldsymbol{\mu}+\mathbf{b},\mathbf{A}\mathbf{\Sigma}\mathbf{A}^{\top})$ .
- Proof
  Will come later

Conditional distribution:
Let $\mathbf{y}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma})$ be a random vector of dimension $m+p$ . Partition $\mathbf{y}$ into a vector of dimension $m$ , $\mathbf{x}\in\mathbb{R}^{m}$ and a vector of dimension $p$ , $\mathbf{z}\in\mathbb{R}^p$ , and partition $\boldsymbol{\mu}$ and $\mathbf{\Sigma}$ respectively,
$\begin{equation*} \left[\begin{matrix} \mathbf{x} \\ \mathbf{z} \end{matrix}\right]\sim \mathcal{N}\left( \left[\begin{matrix} \boldsymbol{\mu}_x\\ \boldsymbol{\mu}_z \end{matrix}\right], \left[ \begin{matrix} \mathbf{\Sigma}_{xx} &\mathbf{\Sigma}_{xz}\\ \mathbf{\Sigma}_{zx} &\mathbf{\Sigma}_{zz} \end{matrix} \right] \right) \end{equation*}$
then, the conditional distribution $p(\mathbf{x}\mid\mathbf{z}=z)=\mathcal{N}(\mathbf{x};\boldsymbol{\mu}_{\mathbf{x}\mid\mathbf{z}},\mathbf{\Sigma}_{\mathbf{x}\mid\mathbf{z}})$ , where
$\begin{align*} \boldsymbol{\mu}_{\mathbf{x}\mid\mathbf{z}}=\boldsymbol{\mu}_x+\mathbf{\Sigma}_{xz}\boldsymbol{\Sigma}_{zz}^{-1}\left(z-\mathbf{\mu}_z\right), &&\mathbf{\Sigma}_{\mathbf{x}\mid\mathbf{z}}=\mathbf{\Sigma}_{xx}-\mathbf{\Sigma}_{xz}\mathbf{\Sigma}_{zz}^{-1}\mathbf{\Sigma}_{zx} \end{align*}$
- Proof
  Will come later

Now, if we assume the initial belief to be Gaussian, i.e., $\mathrm{bel}(\mathbf{x}_0)=p(\mathbf{x}_0)=\mathcal{N}(\mathbf{x}_0;\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0)$ . Then, from equations (1) and (2), we can see that all subsequent beliefs will be Gaussian. Thus,

\begin{align*} \overline{\mathrm{bel}}(\mathbf{x}_t) &=p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})=\mathcal{N}(\mathbf{x}_t;\boldsymbol{\mu}_t^-,\mathbf{\Sigma}_t^-)\\ \mathrm{bel}(\mathbf{x}_t)&= p(\mathbf{x}_t\mid\mathbf{z}_{1:t},\mathbf{u}_{1,t})=\mathcal{N}(\mathbf{x}_t;\boldsymbol{\mu}_t,\mathbf{\Sigma}_t) \end{align*}

Then, we only need to track $\boldsymbol{\mu}_t^-,\mathbf{\Sigma}_t^-$ in the prediction step and $\boldsymbol{\mu}_t,\mathbf{\Sigma}_t$ in the correction step.

For the prediction step, use the joint distribution $p(\mathbf{x}_t,\mathbf{x}_{t-1}\mid\mathbf{u}_{1:t},\mathbf{z}_{1:t-1})$ and realize

\begin{equation*} \left[\begin{matrix} \mathbf{x}_t \\ \mathbf{x}_{t-1} \end{matrix}\right]\sim \mathcal{N}\left( \left[\begin{matrix} \mathbf{F}_t\boldsymbol{\mu}_{t-1}+\mathbf{G}_t\mathbf{u}_t\\ \boldsymbol{\mu}_{t-1} \end{matrix}\right], \left[ \begin{matrix} \mathbf{F}_{t}\mathbf{\Sigma}_{t-1}\mathbf{F}^{\top}+\mathbf{Q}_t &\mathbf{F}_t\mathbf{\Sigma}_{t-1}\\ \mathbf{\Sigma}_{t-1}\mathbf{F}_t^{\top} &\mathbf{\Sigma}_{t-1} \end{matrix} \right] \right) \end{equation*}

Derivation [2]
From equation (3),
$\underbrace{\begin{bmatrix}\mathbf{x}_t \\\mathbf{x}_{t-1}\end{bmatrix}}_{\mathbf{y}_t}=\underbrace{\begin{bmatrix}\mathbf{F}_t \\\mathbf{I}\end{bmatrix}}_{\mathbf{A}_t}\mathbf{x}_{t-1}+\underbrace{\begin{bmatrix}\mathbf{G}_t \\\mathbf{0}\end{bmatrix}}_{\mathbf{B}_t}\mathbf{u}_t+\underbrace{\begin{bmatrix}\mathbf{I} \\\mathbf{0}\end{bmatrix}}_{\mathbf{W}_t}\mathbf{w}_t$
Compute the mean and covariance,
$\mathbb{E}[\mathbf{y}_t] = \mathbb{E}\!\left[\mathbf{A}_t \mathbf{x}_{t-1} + \mathbf{B}_t \mathbf{u}_t + \mathbf{W}_t \mathbf{w}_t\right] = \mathbf{A}_t \boldsymbol{\mu}_{t-1} + \mathbf{B}_t \mathbf{u}_t = \begin{bmatrix} \mathbf{F}_t \boldsymbol{\mu}_{t-1} + \mathbf{G}_t \mathbf{u}_t \\ \boldsymbol{\mu}_{t-1} \end{bmatrix}$
$\begin{align*} \mathrm{Cov}[\mathbf{y}_t] &= \mathbb{E}\!\left[(\mathbf{y}_t - \mathbb{E}[\mathbf{y}_t])(\mathbf{y}_t - \mathbb{E}[\mathbf{y}_t])^\top\right] \\ &= \mathbb{E}\!\left[\left(\mathbf{A}_t(\mathbf{x}_{t-1} - \boldsymbol{\mu}_{t-1}) + \mathbf{W}_t \mathbf{w}_t\right) \left(\mathbf{A}_t(\mathbf{x}_{t-1} - \boldsymbol{\mu}_{t-1}) + \mathbf{W}_t \mathbf{w}_t\right)^\top\right] \\ &= \mathbf{A}_t \mathbf{\Sigma}_{t-1} \mathbf{A}_t^\top + \mathbf{W}_t \mathbf{Q}_t \mathbf{W}_t^\top \\ &= \begin{bmatrix} \mathbf{F}_t \mathbf{\Sigma}_{t-1} \mathbf{F}_t^\top + \mathbf{Q}_t & \mathbf{F}_t \mathbf{\Sigma}_{t-1} \\ \mathbf{\Sigma}_{t-1} \mathbf{F}_t^\top & \mathbf{\Sigma}_{t-1} \end{bmatrix} \end{align*}$

Then $\overline{\mathrm{bel}}(\mathbf{x}_t) =p(\mathbf{x}_t\mid\mathbf{z}_{1:t-1},\mathbf{u}_{1:t})$ can be obtained by marginalize $\mathbf{x}_{t-1}$ ,

\begin{align} \boldsymbol{\mu}_t^-&=\mathbf{F}_t\boldsymbol{\mu}_{t-1}+\mathbf{G_t}\mathbf{u}_t && \text{(predicted mean)} \\ \mathbf{\Sigma}_{t}^-&=\mathbf{F}_{t}\mathbf{\Sigma}_{t-1}\mathbf{F}^{\top}+\mathbf{Q}_t && \text{(predicted covariance)} \end{align}

For the correction step, form a joint distribution $p(\mathbf{x}_t,\mathbf{z}_t\mid \mathbf{u}_{1:t},\mathbf{z}_{1:t-1})$ and realize

\begin{equation*} \left[\begin{matrix} \mathbf{x}_t \\ \mathbf{z}_t \end{matrix}\right]\sim \mathcal{N}\left( \left[\begin{matrix} \boldsymbol{\mu}_t^-\\ \mathbf{H}_t\boldsymbol{\mu}_t^- \end{matrix}\right], \left[ \begin{matrix} \mathbf{\Sigma}_t^- &\mathbf{\Sigma}_t^-\mathbf{H}_t^{\top}\\ \mathbf{H}_t\mathbf{\Sigma}_{t}^- &\mathbf{H}_t\mathbf{\Sigma}_{t}^-\mathbf{H}_t^{\top}+\mathbf{R}_t \end{matrix} \right] \right) \end{equation*}

Derivation [2]
Since $\mathbf{x}_t\sim\mathcal{N}(\boldsymbol{\mu}_t^-,\mathbf{\Sigma}_t^-)$ and $\mathbf{z}_t=\mathbf{H_t}\mathbf{x}_t+\mathbf{v}_t$ , where $\mathbf{v}_t\sim\mathcal{N}(\mathbf{0},\mathbf{R}_t)$ . From property 1, we have
$\mathbf{z}_t\sim\mathcal{N}(\mathbf{H}_t\boldsymbol{\mu}_t^-,\mathbf{H}_t\mathbf{\Sigma}_t^-\mathbf{H}_t^{\top}+\mathbf{R}_t)$
Furthermore,
$\begin{align*} \mathbf{\Sigma}_{xz}&=\mathrm{Cov}(\mathbf{x}_t,\mathbf{z}_t)=\mathrm{Cov}(\mathbf{x}_t,\mathbf{H_t}\mathbf{x}_t+\mathbf{v}_t)=\mathrm{Cov}(\mathbf{x}_t,\mathbf{H}_t\mathbf{x}_t)\\ &=\mathbb{E}\left[\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)\left(\mathbf{H}_t\mathbf{x}_t-\mathbf{H_t}\boldsymbol{\mu}_t^-\right)^{\top}\right]\\ &=\mathbb{E}\left[\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)^{\top}\mathbf{H_t}^{\top}\right]\\ &=\mathbb{E}\left[\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)\left(\mathbf{x}_t-\boldsymbol{\mu}_t^-\right)^{\top}\right]\mathbf{H_t}^{\top}\\ &=\mathbf{\Sigma}_t^-\mathbf{H_t}^{\top} \\ \mathbf{\Sigma}_{zx}&=\mathrm{Cov}(\mathbf{z}_t,\mathbf{x}_t)=\dots=\mathbf{H}_t\mathbf{\Sigma}_t^- \end{align*}$

Thus, $\mathrm{bel}(\mathbf{x}_t)=p(\mathbf{x}_t\mid\mathbf{z}_t,\mathbf{u}_{1:t},\mathbf{z}_{1:t-1})$ , where

\begin{align*} \boldsymbol{\nu}_t&=\mathbf{z}_t-\mathbf{H}_t\boldsymbol{\mu}_t^- && \text{(innovation)} \\ \mathbf{S}_t&=\mathbf{H}_t\mathbf{\Sigma}_{t}^-\mathbf{H}_t^{\top}+\mathbf{R}_t && \text{(innovation covariance)}\\ \mathbf{K}_t&=\mathbf{\Sigma}_t^-\mathbf{H}_t^{\top}\mathbf{S}_t^{-1} && \text{(Kalman gain)} \end{align*}

\begin{align} \boldsymbol{\mu}_t&=\boldsymbol{\mu}_t^-+\mathbf{K}_t\boldsymbol{\nu}_k && \text{(corrected mean)}\\ \mathbf{\Sigma}_t&=\left(\mathbf{I}-\mathbf{K}_t\mathbf{H_t}\right)\mathbf{\Sigma}_t^- && \text{(corrected covariance)} \end{align}

Finally, normalize for numerical stability

\begin{equation} \mathbf{\Sigma}_t=\left(\mathbf{I}-\mathbf{K}_t\mathbf{H_t}\right)\mathbf{\Sigma}_t^-\left(\mathbf{I}-\mathbf{K}_t\mathbf{H_t}\right)^{\top}+\mathbf{K}_t\mathbf{R}_t\mathbf{K}_t^{\top} \end{equation}

Extended Kalman Filter (EKF)

One way to handle the case where motion model and measurement model are not linear is to approximate the functions via local linearization (Taylor expansion). Specifically, we extend the Kalman filter algorithm to cases where models are non-linear:

Motion model

\begin{align*} \mathbf{x}_t=f(\mathbf{u}_t,\mathbf{x}_{t-1})+\mathbf{w}_t, &&\mathbf{w}_t\sim\mathcal{N}(\mathbf{0},\mathbf{Q}_t) \end{align*}

Measurement model

\begin{align*} \mathbf{z}_t=h(\mathbf{x}_t)+\mathbf{v}_t, &&\mathbf{v}_t \sim\mathcal{N}(\mathbf{0}, \mathbf{R}_t) \end{align*}

Let

\mathbf{F}_t=\left.\frac{\partial f}{\partial \mathbf{x}}\right|_{\mathbf{x}=\boldsymbol{\mu}_{t-1}}, \qquad \mathbf{W}_t=\left.\frac{\partial f}{\partial \mathbf{w}}\right|_{\mathbf{x}=\boldsymbol{\mu}_{t-1}}, \qquad \mathbf{H}_t=\left.\frac{\partial h}{\partial \mathbf{x}}\right|_{\mathbf{x}=\boldsymbol{\mu}_t^-}, \qquad \mathbf{V}_t=\left.\frac{\partial h}{\partial \mathbf{v}}\right|_{\mathbf{x}=\boldsymbol{\mu}_t^-}

The EKF algorithm is

\begin{align} &\textbf{Input: } \boldsymbol{\mu}_{t-1},\; \boldsymbol{\Sigma}_{t-1},\; \mathbf{u}_t,\; \mathbf{z}_t \nonumber\\[6pt] &\quad\quad \text{1:}\quad \boldsymbol{\mu}_t^- \leftarrow f(\mathbf{u}_t,\boldsymbol{\mu}_{t-1}) &&\text{(predicted mean)} \nonumber\\ &\quad\quad\text{2:}\quad \boldsymbol{\Sigma}_t^- \leftarrow \mathbf{F}_t\boldsymbol{\Sigma}_{t-1}\mathbf{F}_t^\top + \mathbf{W}_t\mathbf{Q}_t\mathbf{W}_t^\top &&\text{(predicted covariance)} \nonumber\\ &\quad\quad\text{3:}\quad \boldsymbol{\nu}_t \leftarrow \mathbf{z}_t - h(\boldsymbol{\mu}_t^-) &&\text{(innovation)} \nonumber\\ &\quad\quad\text{4:}\quad \mathbf{S}_t \leftarrow \mathbf{H}_t\boldsymbol{\Sigma}_t^-\mathbf{H}_t^\top + \mathbf{V}_t\mathbf{R}_t\mathbf{V}_t^\top &&\text{(innovation covariance)} \nonumber\\ &\quad\quad\text{5:}\quad \mathbf{K}_t \leftarrow \boldsymbol{\Sigma}_t^-\mathbf{H}_t^\top\mathbf{S}_t^{-1} &&\text{(filter gain)} \nonumber\\ &\quad\quad\text{6:}\quad \boldsymbol{\mu}_t \leftarrow \boldsymbol{\mu}_t^- + \mathbf{K}_t\boldsymbol{\nu}_t &&\text{(corrected mean)} \nonumber\\ &\quad\quad\text{7:}\quad \boldsymbol{\Sigma}_t \leftarrow (\mathbf{I} - \mathbf{K}_t\mathbf{H}_t)\boldsymbol{\Sigma}_t^- &&\text{(corrected covariance)} \nonumber\\ &\quad\quad\text{8:}\quad \boldsymbol{\Sigma}_t \leftarrow (\mathbf{I} - \mathbf{K}_t\mathbf{H}_t)\boldsymbol{\Sigma}_t^-(\mathbf{I} - \mathbf{K}_t\mathbf{H}_t)^\top + \mathbf{K}_t\mathbf{R}_t\mathbf{K}_t^\top &&\text{(numerically stable form)} \nonumber\\[6pt] &\textbf{return}\;\; \boldsymbol{\mu}_t,\; \boldsymbol{\Sigma}_t \nonumber \end{align}

Unscented Kalman Filter (UKF)

Another non-linear Kalman filter is the unscented Kalman filter which uses deterministic sampling to approximate the distribution.

Let a function $f:\mathbb{R}^{n}\to\mathbb{R}^m$ be a non-linear transformation. Suppose $\mathbf{x}\in\mathbb{R}^n$ and $\mathbf{x}\sim\mathcal{N}(\boldsymbol{\mu},\mathbf{\Sigma})$ . We draw a set of $n$ sigmapoints, $\mathcal{X}=\{\mathbf{x}_i\}_{i=1}^{n}$ , and compute a weight $w_i$ for each sigma point by executing

\begin{align*} &\texttt{SigmaPoints}(\boldsymbol{\mu},\mathbf{\Sigma}):\\ &\quad\quad \mathbf{x}_0 = \boldsymbol{\mu} \nonumber\\ &\quad\quad \mathbf{x}_i = \boldsymbol{\mu} + \boldsymbol{\ell}'_i, \qquad i = 1,\ldots,n \nonumber\\ &\quad\quad \mathbf{x}_i = \boldsymbol{\mu} - \boldsymbol{\ell}'_{i-n}, \qquad i = n+1,\ldots,2n \nonumber\\[4pt] &\quad\quad w_0 = \dfrac{\kappa}{n+\kappa} \nonumber\\[6pt] &\quad\quad w_i = \dfrac{1}{2(n+\kappa)}, \qquad i = 1,\ldots,2n\\ &\textbf{return}\,\, (\mathcal{X},\mathbf{w}) \end{align*}

Further let $\mathbf{y}=f(\mathbf{x})$ , then $\mathbf{y}\sim\mathcal{N}(\boldsymbol{\mu}',\mathbf{\Sigma}')$ , where

\begin{align*} \boldsymbol{\mu}'&=\sum_{i=0}^{2n}w_if(\mathbf{x}_i)\\ \mathbf{\Sigma}'&=\sum_{i=0}^{2n}w_i(f(\mathbf{x}_i)-\boldsymbol{\mu}')(f(\mathbf{x}_i)-\boldsymbol{\mu}')^{\top} \end{align*}

The UKF algorithm is

\begin{align} &\textbf{Input: } \boldsymbol{\mu}_{t-1},\; \boldsymbol{\Sigma}_{t-1},\; \mathbf{u}_t,\; \mathbf{z}_t \nonumber\\[6pt] &\quad\quad\text{1:}\quad (\mathcal{X}_{t-1},\,\mathbf{w}^-) \leftarrow \texttt{SigmaPoints}(\boldsymbol{\mu}_{t-1},\boldsymbol{\Sigma}_{t-1}) \nonumber\\ &\quad\quad\text{2:}\quad \boldsymbol{\mu}_t^- = \sum_{i=0}^{2n} w_i^-\, f(\mathbf{u}_t, \mathbf{x}_{t-1,i}) &&\text{(predicted mean)} \nonumber\\ &\quad\quad\text{3:}\quad \boldsymbol{\Sigma}_t^- \leftarrow \sum_{i=0}^{2n} w_i^- \bigl(f(\mathbf{u}_t,\mathbf{x}_{t-1,i}) - \boldsymbol{\mu}_t^-\bigr) \bigl(f(\mathbf{u}_t,\mathbf{x}_{t-1,i}) - \boldsymbol{\mu}_t^-\bigr)^\top + \mathbf{Q}_t &&\text{(predicted covariance)} \nonumber\\ &\quad\quad\text{4:}\quad (\mathcal{X}_t^-,\,\mathbf{w}) \leftarrow \texttt{SigmaPoints}(\boldsymbol{\mu}_t^-, \boldsymbol{\Sigma}_t^-) \nonumber\\ &\quad\quad\text{5:}\quad \mathbf{z}_t^- = \sum_{i=0}^{2n} w_i\, h(\mathbf{x}_{t,i}^-) &&\text{(predicted measurement)} \nonumber\\ &\quad\quad\text{6:}\quad \boldsymbol{\nu}_t \leftarrow \mathbf{z}_t - \mathbf{z}_t^- &&\text{(innovation)} \nonumber\\ &\quad\quad\text{7:}\quad \mathbf{S}_t \leftarrow \sum_{i=0}^{2n} w_i \bigl(h(\mathbf{x}_{t,i}^-) - \mathbf{z}_t^-\bigr) \bigl(h(\mathbf{x}_{t,i}^-) - \mathbf{z}_t^-\bigr)^\top + \mathbf{R}_t &&\text{(innovation covariance)} \nonumber\\ &\quad\quad\text{8:}\quad \boldsymbol{\Sigma}_t^{xz} \leftarrow \sum_{i=0}^{2n} w_i \bigl(\mathbf{x}_{t,i}^- - \boldsymbol{\mu}_t^-\bigr) \bigl(h(\mathbf{x}_{t,i}^-) - \mathbf{z}_t^-\bigr)^\top &&\text{(cross covariance)} \nonumber\\ &\quad\quad\text{9:}\quad \mathbf{K}_t \leftarrow \boldsymbol{\Sigma}_t^{xz}\,\mathbf{S}_t^{-1} &&\text{(filter gain)} \nonumber\\ &\quad\quad\text{10:}\quad \boldsymbol{\mu}_t \leftarrow \boldsymbol{\mu}_t^- + \mathbf{K}_t\boldsymbol{\nu}_t &&\text{(corrected mean)} \nonumber\\ &\quad\quad\text{11:}\quad \boldsymbol{\Sigma}_t \leftarrow \boldsymbol{\Sigma}_t^- - \mathbf{K}_t\mathbf{S}_t\mathbf{K}_t^\top &&\text{(corrected covariance)} \nonumber\\[6pt] &\textbf{return}\;\; \boldsymbol{\mu}_t,\; \boldsymbol{\Sigma}_t \nonumber \end{align}

Reference

[1] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. Mit Press, 2010.
[2] M. Ghaffari, ROB530 Lecture Slides — Kalman Filtering. University of Michigan, 2026.

Acoustic Damping with a Helmholtz Resonator

6 minute read

Published: August 06, 2024

Wave Theory of Sound

Two fundamental equations in acoustics are

Continuity Equation

The continuity equation states that the rate at which mass enters a system is equal to the rate at which mass leaves the system plus the accumulation of mass within the system.

\begin{equation} \frac{\partial\rho}{\partial t}+\nabla\cdot \left(\rho \boldsymbol{v}\right)=0 \end{equation}

$\rho$ is fluid density

$\boldsymbol{v}$ is the velocity vector of the fluid

This equation is essentially $\sum m_{in} - \sum m_{out} = \Delta m_{system}$ .

Momentum Equation

The momentum equation states that the rate of change of momentum of a fluid element is equal to the sum of the pressure gradient force, convective acceleration, and body forces.

\begin{equation} \frac{\partial\boldsymbol{v}}{\partial t}+\boldsymbol{v}\cdot \nabla\boldsymbol{v}+\frac{\nabla p}{\rho}=\boldsymbol{g} \end{equation}

$p$ is the pressure

$\boldsymbol{g}$ is the body forces per unit mass.

This equation is essentially $F=ma$ .

Wave Equation

The ambient state is characterized by its pressure $p_0$ , density $\rho_0$ , and fluid velocity $\boldsymbol{u}_0$ . Acoustic disturbances are regarded as perturbations to the ambient state.

It can be derived from the previous two equations that the acoustic pressure $p$ , which is the difference between the pressure at $(x,t)$ and the ambient pressure, satisfies the wave equation

\frac{\partial ^2p}{\partial x^2} - \frac{1}{c^2}\frac{\partial^2p}{\partial t^2}=0

where $c$ is the speed of propagation. For simplicity, we only consider the one-dimensional case.

Sinusoidal Plane-wave

Suppose the sound we use in this experiment is a sinusoidal wave with a constant angular frequency $\omega$ . By introducing phasors, then by separation of variable, the solution is

p\left(x,t\right) = e^{i\omega t}f\left(x\right)

Plug back into the wave equation to get

\frac{\partial^2}{\partial t^2}\left(e^{i\omega t}f\left(x\right)\right)=c^2\frac{\partial^2}{\partial x^2}\left(e^{i\omega t}f\left(x\right)\right)

or simply

\frac{\mathrm{d}^2}{\mathrm{d}x^2}f\left(x\right) = -\left(\frac{\omega}{c}\right)^2f\left(x\right)

which is a very familiar second-order linear ODE. The general solution is

f\left(x\right)=Ae^{-ikx}+Be^{ikx}

where $k=\omega/c$ is called the wave number. Finally, the total solution is

p\left(x,t\right) = e^{i\omega t}\left(Ae^{-ikx}+Be^{ikx}\right)=Ae^{i\left(\omega t-kx\right)}+Be^{i\left(\omega t + kx\right)}

where $A,B\in\mathbb{C}$ are determined by initial and boundary conditions.

Acoustical Two-ports

Consider a model in the figure below where an acoustic element is located between two straight ducts.

This two-port system is fully characterized by an acoustic transfer matrix, which can be written as a relation between the pressure and velocities on each side of the two-port system.

We need to consider two variables , the acoustic pressure $p$ and the volume velocity $u$ , which are analogous to voltage $V$ and current $I$ in electric circuits respectively. Then, similar to Kirchhoff’s voltage and current law, we have the continuity of pressure law and continuity of volume law.

Continuity of Pressure

This law states that acoustic pressure does not vary appreciably over distances much less than a wavelength. Consider a path from $x_1$ to $x_2$ ,

Starting from the momentum equation (Equ. 2), neglecting nonlinear term $\left(\boldsymbol{v}\cdot \nabla\right)\boldsymbol{v}$ , and integrate along the path, we get,

\rho\frac{\partial}{\partial t}\int_{x_1}^{x_2}\boldsymbol{v}\cdot\mathrm{d}\ell = -\int_{x_1}^{x_2}\nabla p\cdot\mathrm{d}\ell = -\left(p_2-p_1\right)

where $\rho$ is the density of the medium, and $\boldsymbol{v}$ is the particle velocity vector.

Continuity of Volume Velocity

This law states that the net volume velocity flowing out of a volume is zero.

Starting from the continuity equation (Equ. 2), with an application of Gauss’s theorem, we obtain

\sum U=-\frac{\partial}{\partial t}\int\frac{p}{\rho c^2}\mathrm{d}V

Acoustic Impedance

We can define the acoustic impedance as the ratio of acoustic pressure and volume velocity,

Z = \frac{p}{u}

Notice that a proper definition can be found here.

For a Helmholtz Resonator, geometry shown below,

we have,

Z_{\mathrm{HR}}=Z_{\mathrm{vol}} + Z_{\mathrm{op}}

\begin{align*} Z_{\mathrm{vol}} = -\frac{1}{i\omega C_A} && \text{and} && Z_{\mathrm{op}} = -i\omega M_A \end{align*}

where $C_A = V/\rho c^2$ is the acoustic compliance, and $M_A = \rho l'/A$ is the acoustic inertance, $l'$ is the “effective neck length”.

Reflection and Transmission

The reason why Helmholtz Resonator can reduce noise is that it can “reflect” incident acoustic wave almost completely when the frequency of the wave is at the resonance frequency.

We define the reflection coefficient as the ratio of the intensity of the transmitted wave ( $I_t$ ) to the incident wave ( $I_i$ ),

\mathcal{T}=\frac{I_t}{I_i}

It can be derived that (but I don’t know how), for Helmholtz Resonator,

\mathcal{T}=1+\frac{-\rho c/A_D}{2Z_{HR}+\rho c/A_D}

At the resonance frequency,

\omega = c\sqrt{\frac{A}{l'V}}

The acoustic impedance of HR is

Z_{HR}=\frac{\rho c^2}{-i\omega V}-i\omega\frac{\rho l'}{A}=0

Then,

\mathcal{T}=0

Thus the resonator has the potentially useful property of causing nearly total reflection of acoustic waves at frequencies near its resonance frequency. That’s why it is useful for acoustic damping.

The transmission loss is defined as

\mathrm{TL} = 10\log\left|\frac{1}{\mathcal{T}}\right|^2

By setting proper value for $A, l',\rho, c, A_D$ , we can plot the transmission loss vs. frequency

Experiment

In the experiment, we want to measure the sound transmission loss of a Helmholtz Resonator for different frequency. The setup is shown below

where test specimen is a HR connected as a side branch.

We use the loudspeaker to generate a sinusoidal wave at frequency $f$ , then on the left hand side (upstream),

p_{\mathrm{up}} = Ae^{i(\omega t-kx)}+Be^{i(\omega t + kx)}

and on the right hand side (downstream),

p_{\mathrm{down}} = Ce^{i(\omega t-kx)}+De^{i(\omega t+kx)}

Here $Ae^{i(kx+\omega t)}$ is the incident wave, $Be^{i(kx+\omega t)}$ is the reflected wave, and $Ce^{-i(kx+\omega t)}$ is the transmitted wave. By definition, the transmission loss is

\mathrm{TL} = 10\log\left|\frac{A}{C}\right|^2

To solve for $A$ and $C$ , we use four microphone to measure the acoustic pressure at four different places,

\begin{align*} p_1 &= A e^{-ikx_1} + B e^{ikx_1}, \\ p_2 &= A e^{-ikx_2} + B e^{ikx_2}, \\ p_3 &= C e^{-ikx_3} + D e^{ikx_3}, \\ p_4 &= C e^{-ikx_4} + D e^{ikx_4}, \end{align*}

Solving for $A,B,C,D$ ,

\begin{align*} A &= \frac{i(p_1 e^{ikx_2} - p_2 e^{ikx_1})}{2 \sin k(x_1 - x_2)}, \\ B &= \frac{i(p_2 e^{-ikx_1} - p_1 e^{-ikx_2})}{2 \sin k(x_1 - x_2)}, \\ C &= \frac{i(p_3 e^{ikx_4} - p_4 e^{ikx_3})}{2 \sin k(x_3 - x_4)}, \\ D &= \frac{i(p_4 e^{-ikx_3} - p_3 e^{-ikx_4})}{2 \sin k(x_3 - x_4)}. \end{align*}

For convenience, we can take $s = \left|x_1-x_2\right|=\left|x_3-x_4\right|$ , then the transmission loss is

\mathrm{TL}=10\log\left|\frac{e^{iks}-p_2/p_1}{e^{iks}-p_4/p_3}\right|^2

The microphone record voltage signal, which is proportional to the acoustic pressure $p = p(t)$ , apply Fourier transform, take $p_1(t)=Ae^{i(\omega t-kx_1)}+Be^{i(\omega t+kx_1)}$

\hat{p}\left(\omega\right)=\int_{\mathbb{R}}e^{-i\omega t}p_1\left(t\right)\mathrm{d}t = A e^{-ikx_1} + B e^{ikx_1}

Thus, we can see that $p_1,\dots, p_4$ are just the Fourier coefficient at the given frequency. Since we only need to ratio, by applying a Fast Fourier Transform directly to the voltage signal, and take their ratio will give the same result.

Solving Basic PDEs

4 minute read

Published: December 07, 2023

A partial differential equation (PDE) is a relation involving one or more functions of several variables, and their partial derivatives. The order of a partial differential equation is the order of the highest partial derivative that appears in the equation.

There are three classical partial differential equations of order two which are very common in physical applications. They are known as the heat equation (1), the wave equation (2), and the Laplace equation (3).

\begin{align} &\frac{\partial u}{\partial t} = \alpha^2\frac{\partial^2u}{\partial x^2}\\ &\frac{\partial^2 u}{\partial t^2}=c^2\frac{\partial u^2}{\partial x^2}\\ &\frac{\partial ^2u}{\partial x^2}+\frac{\partial ^2u}{\partial y^2}=0 \end{align}

Heat Equation

Consider the boundary-value problem

\begin{align*} \frac{\partial u}{\partial t} = \alpha^2\frac{\partial^2u}{\partial x^2}; &&u\left(x,0\right)=f\left(x\right), 0<x<l;&& u\left(0,t\right)=u\left(l,t\right)=0 \end{align*}

First, it’s easy to see that the equation is linear, which means any linear combination of solutions $u_1\left(x,t\right),\dots,u_n\left(x,t\right)$ is again a solution. This suggests the following “plan” for solving this boundary-value problem.

Find as many solutions $u_i\left(x,t\right)$ as we can that satisfies the boundary condition $u\left(0,t\right)=u\left(l,t\right)=0$ .

Find a proper linear combination of $u_i$ to satisfies the initial condition $u\left(x,0\right)=f\left(x\right)$ .

An important method to solve partial differential equations is separation of variables, which means we first assume $u\left(x,t\right)=X\left(x\right)T\left(t\right)$ . Then

\begin{align*} \frac{\partial u}{\partial t}=XT', && \frac{\partial^2u}{\partial x^2}=X''T \end{align*}

Plug this in the original equation and get

XT'=\alpha^2X''T

divide both side by $\alpha^2XT$ gives

\frac{X''}{X}=\frac{T'}{\alpha^2T}

Notice that the left-hand side only depend on $x$ , while the right-hand side only depend on $t$ . The only way that they can equal for any $x$ and $t$ is that both of them are constant. Thus,

\begin{align*} \frac{X''}{X}=-\lambda, && \frac{T'}{\alpha^2T}=-\lambda \end{align*}

By separation of variable, we have transformed a PDE into two solvable ODEs,

\begin{align*} X''+\lambda X=0, && T'+\alpha^2\lambda T=0 \end{align*}

Now, plug in the boundary condition.

\begin{align*} 0=u\left(0,t\right)=X\left(0\right)T\left(t\right), && 0=u\left(l,t\right)=X\left(l\right)T\left(t\right) \end{align*}

Since $T\left(t\right)\ne 0$ (not identically zero, otherwise $u$ would be identically zero, and we have a trivial solution), we have $X\left(0\right)=X\left(l\right)=0$ .

Now, we only need to solve

\begin{align*} X''+\lambda X = 0; && X\left(0\right)=0, X\left(l\right)=0 \end{align*}

and

T'+\lambda\alpha^2T=0

The first one is a second-order, linear, constant coefficient ODE, which we can directly write out the solution. There are three different cases:

$\lambda = 0$ : The general solution is $X\left(x\right)=c_1x+c_2$ for some constant $c_1, c_2$ . Plug in the boundary condition, we have $c_1=c_2=0$ . Thus, there are no non-trivial solution for $\lambda = 0$ .

$\lambda<0$ : The general solution is $X\left(x\right)=c_1e^{\sqrt{-\lambda}x}+c_1e^{\sqrt{-\lambda}x}$ for some constant $c_1,c_2$ . The boundary condition implies

\begin{align*} c_1+c_2=0, && c_1e^{\sqrt{-\lambda}l}+c_2e^{-\sqrt{-\lambda}l}=0 \end{align*}

This linear system of equation has non-trivial solution $c_1, c_2$ if and only if

\det\left(\begin{matrix}1 &1\\ e^{\sqrt{-\lambda}l}&e^{-\sqrt{-\lambda}l}\end{matrix}\right)=e^{-\sqrt{-\lambda}l}-e^{\sqrt{-\lambda}l}=0

This implies $e^{2\sqrt{-\lambda}l}=1$ , which is impossible since $2\sqrt{-\lambda}l>0$ . Therefore, there are also no non-trivial solution for $\lambda < 0$ .

$\lambda > 0$ : The general solution is $X\left(x\right)=c_1\cos\sqrt{\lambda}x+c_2\sin\sqrt{\lambda}x$ for some constant $c_1,c_2$ The boundary condition implies

\begin{align*} c_1=0, && c_2\sin\sqrt{\lambda}l=0 \end{align*}

For $c_2\ne0$ , we need to have $\sqrt{\lambda}l=n\pi$ for positive integer $n$ , or $\displaystyle \lambda_n=\frac{n^2\pi^2}{l^2}$ . Then the equation has non-trivial solutions

X\left(x\right)=X_n\left(x\right)=\sin\frac{n\pi x}{l}

Then, we have the second equation,

T'+\frac{n^2\pi^2\alpha^2}{l^2}T=0

which has solution

T\left(t\right)=T_n\left(t\right)=e^{-\alpha^2n^2\pi^2t/l^2}

Hence, we have

u_n\left(x,t\right)=X_n\left(x\right)T_n\left(t\right)=\sin\frac{n\pi x}{l}e^{-\alpha^2n^2\pi^2t/l^2}

Finally, we need to use the initial condition to find a proper linear combination of $u_n$ ’s, i.e., find coefficients $c_n$ ’s such that

u\left(x,t\right)=\sum_{n=0}^{\infty}c_n\sin\frac{n\pi x}{l}e^{-\alpha^2n^2\pi^2t/l^2}

satisfies $u\left(x,0\right)=f\left(x\right)$ . This means that

f\left(x\right)=\sum_{n=0}^{\infty}c_n\sin\frac{n\pi x}{l}

It’s easily seen that $c_n$ ’s should be the Fourier coefficient of the pure sine expansion, which is

c_n=\frac{2}{l}\int_{0}^{l}f\left(x\right)\sin\frac{n\pi x}{l}\mathrm{d}x

Finally,

u\left(x,t\right)=\frac{2}{l}\sum_{n=0}^{\infty}\left[\int_{0}^{l}f\left(x\right)\sin\frac{n\pi x}{l}\mathrm{d}x\right]\sin\frac{n\pi x}{l}e^{-\alpha^2n^2\pi^2t/l^2}

is the desired solution.

Wave Equation

Consider the boundary-value problem

\begin{align*} \frac{\partial^2u}{\partial t^2}=c^2\frac{\partial^2 u}{\partial x^2}; && u\left(x,0\right)=f\left(x\right), u_t\left(x,0\right)=g\left(x\right); &&u\left(0,t\right)=u\left(l,t\right)=0 \end{align*}

This problem can also be solved by separation of variable. Let $u\left(x,t\right)=X\left(x\right)T\left(t\right)$ , then

\begin{align*} \frac{\partial^2u}{\partial t^2}=XT'', && \frac{\partial^2u}{\partial x^2}=X''T \end{align*}

Plug into the original equation, we have

\frac{T''}{c^T}=\frac{X''}{X}=-\lambda

or two ODEs,

\begin{align*} X''+\lambda X=0; && X\left(0\right)=X\left(l\right)=0 \end{align*}

and

T''+\lambda c^2T=0

We’ve seen in the previous problem that the first equation has solutions

X\left(x\right)=X_n\left(x\right)=\sin\frac{n\pi x}{l}

For the second equation, it becomes

T''+\frac{n^2\pi^2c^2}{l^2}T=0

The solutions are

T\left(t\right)=T_n\left(t\right)=a_n\cos\frac{n\pi ct}{l}+b_n\sin\frac{n\pi ct}{l}

Thus,

u_n\left(x,t\right)=\sin\frac{n\pi x}{l}\left[a_n\cos\frac{n\pi ct}{l}+b_n\sin\frac{n\pi ct}{l}\right]

is a non-trivial solution of the BVP for every positive integer $n$ , and every suitable constants $a_n, b_n$ . The linear combination

u\left(x,t\right)=\sum_{n=1}^{\infty}\sin\frac{n\pi x}{l}\left[a_n\cos\frac{n\pi ct}{l}+b_n\sin\frac{n\pi ct}{l}\right]

satisfies

\begin{align*} u\left(x,0\right)=\sum_{n=1}^{\infty}a_n\sin\frac{n\pi x}{l}, && u_t\left(x,0\right)=\sum_{n=1}^{\infty}\frac{n\pi c}{l}b_n\sin\frac{n\pi x}{l} \end{align*}

To satisfy the initial condition, we only need to have

\begin{align*} f\left(x\right)=\sum_{n=1}^{\infty}a_n\sin\frac{n\pi x}{l}, && g\left(x\right)=\sum_{n=1}^{\infty}\frac{n\pi c}{l}b_n\sin\frac{n\pi x}{l} \end{align*}

By Fourier sine expansion, we have

\begin{align*} a_n=\frac{2}{l}\int_{0}^{l}f\left(x\right)\sin\frac{n\pi x}{l}\mathrm{d}x, && b_n=\frac{2}{n\pi c}\int_0^{l}g\left(x\right)\sin\frac{n\pi x}{l}\mathrm{d}x \end{align*}

Solving ODEs by Series

7 minute read

Published: December 03, 2023

Now, we consider a somehow more general case, where we want to solve a linear second-order homogeneous equation with coefficients that are polynomials. Consider the following equation

p\left( t \right) y''+q\left( t \right) y'+r\left( t \right) y=0

where $p\left(t\right)\ne 0$ in some interval and $p\left(t\right),q\left(t\right),r\left(t\right)$ are polynomials of $t$ .

Properties of Power Series

An infinite series

y\left( t \right) =\sum_{n=0}^{\infty}{a_n\left( t-t_0 \right) ^n}=a_0+a_1x+a_2x^2+\cdots

is called a power series about $t=t_0$ . The following are some of its properties.

All power series have an interval of convergence. This means that there exists a nonnegative number $\rho$ such that the infinite series converges for $\left|t-t_0\right|<\rho$ , and diverges for $\left|t-t_0\right|>\rho$ . The number $\rho$ is called the radius of convergence of the power series.

The power series can be differentiated and integrated term by term, and the resultant series have the same radius of convergence.

One simple method to determine the radius of convergence of a power series is the Cauchy ratio test. Suppose

\underset{n\rightarrow \infty}{\lim}\left| \frac{a_{n+1}}{a_n} \right|=\lambda

Then the radius of convergence is $\displaystyle \rho=\frac{1}{\lambda}$ .

Many functions $f\left(t\right)$ that arise in applications can be expanded in power series, i.e., there exists a series of coefficients $a_0,a_1,a_2,\dots$ so that

f\left(t\right)=\sum_{n=0}^{\infty}a_n\left(t-t_0\right)^n

Such functions are said to be analytic at $t=t_0$ , the series expansion is called the Taylor series of $f$ about $t=t_0$ , and the coefficients can be found by

a_n=\frac{f^{\left(n\right)}\left(t_0\right)}{n!}

The radius of convergence can be determined if we treat the function $f$ in the complex plane. Let $z_0\in\mathbb{C}$ be the point that is closest to $t_0$ that $f^{(n)}(z_0)$ fails to exist, then the radius of convergence is $\rho=\left\|z_0-t_0\right\|$ .

The product of two power series is still a power series, this product is called the Cauchy product

\sum_{n=0}^{\infty}{a_n\left( t-t_0 \right) ^n}\sum_{n=0}^{\infty}{b_n\left( t-t_0 \right) ^n}=\sum_{n=0}^{\infty}{c_n\left( t-t_0 \right) ^n}

where the coefficient is

c_n=\sum_{j+k=n}a_jb_k=a_0b_n+a_1b_{n-1}+\cdots+a_nb_0

which is called the convolution of $a_n$ and $b_n$ .

Important series expansions (Remember them!)

\begin{align*} \frac{1}{1-x}=1+x+x^2+\cdots=\sum_{n=0}^{\infty}x^n, && \rho=1 \end{align*}

\begin{align*} e^x=1+x+\frac{x^2}{2}+\frac{x^3}{6}+\cdots=\sum_{n=0}^{\infty}\frac{x^n}{n!}, && \rho=\infty \end{align*}

\begin{align*} \sin x=x-\frac{x^3}{3!}+\frac{x^5}{5!}-\cdots=\sum_{n=0}^{\infty}\frac{\left(-1\right)^nx^{2n+1}}{\left(2n+1\right)!}, && \rho=\infty \end{align*}

\begin{align*} \cos x=1-\frac{x^2}{2!}+\frac{x^4}{4!}-\cdots=\sum_{n=0}^{\infty}\frac{\left(-1\right)^nx^{2n}}{\left(2n\right)!}, && \rho=\infty \end{align*}

\begin{align*} \ln\left(1-x\right)=-x-\frac{x^2}{2}-\frac{x^3}{3}-\cdots=-\sum_{n=1}^{\infty}\frac{x^n}{n}, && \rho = 1 \end{align*}

\begin{align*} \ln\left(1+x\right)=x-\frac{x^2}{2}+\frac{x^3}{3}-\cdots=\sum_{n=1}^{\infty}\left(-1\right)^{n+1}\frac{x^n}{n}, && \rho = 1 \end{align*}

Series Solution

For a second-order equation with variable coefficients (but are analytic functions, at least), although it’s not very rigorous, we can first expect that the solution $y$ is also a power series. Then, by doing series expansion and setting the polynomial coefficients properly, we can expect the equation to hold. A formal statement of this is as follows.

Let the functions $q\left(t\right)/p\left(t\right)$ and $r\left(t\right)/p\left(t\right)$ have convergent Taylor series expansions about $t=t_0$ , for $\left|t-t_0\right|<\rho$ . Then, every solution $y\left(t\right)$ of the differential equation

p\left(t\right)y''+q\left(t\right)y'+r\left(t\right)y=0

is analytic at $t=t_0$ , and the radius of convergence of its Taylor series expansion about $t=t_0$ is at least $\rho$ . Set

y\left(t\right)=a_0+a_1\left(t-t_0\right)+a_2\left(t-t_0\right)^2+\cdots

the coefficients are determined by plugging the expression into the differential equation and setting the sum of the coefficients of like powers of $t$ in this expression equal to zero.

Example (1)

However, it is possible that sometime $p\left(t_0\right)=0$ , and the above method of series expansion will fail to work, but we still want to find a solution around the point $t=t_0$ . One possible case is Euler’s equation.

Euler’s Equation

The differential equation

t^2y''+\alpha ty'+\beta y=0

where $\alpha,\beta\in\mathbb{R}$ are constant is known as Euler’s equation.

This equation has a singular point at $t=0$ and the series solution method above cannot work. However, we can treat $t>0$ and $t<0$ separately.

First, assume that $t>0$ .

For $t<0$ , let $t=-x$ ( $x>0$ ), and let $y=u\left(x\right)$ . Then the equation is transformed into

x^2u''+\alpha x u'+\beta u = 0

which should have the same solution as the case when $t>0$ . Thus, for the final solution, we only need to put a modulus on $t$ .

Observe that $t^2y''$ , $ty'$ , and $y$ will have the same order if $y=t^r$ . Our goal is to find two independent solutions. Therefore, we plug in $y=t^r$ into the equation.

\begin{align*} \frac{\mathrm{d}}{\mathrm{d}t}t^r=rt^{r-1}, && \frac{\mathrm{d}^2}{\mathrm{d}t^2}t^r=r\left(r-1\right)t^{r-2} \end{align*}

The original equation then becomes,

\left[r\left(r-1\right)+\alpha r+\beta\right]t^r=0

Hence, $y=t^r$ is a solution if and only if $r$ is a solution of the quadratic equation

r^2+\left(\alpha-1\right)r+\beta=0

Two Different Real Roots, $\left(\alpha-1\right)^2-4\beta >0$

The general solution is given by

y\left(t\right)=c_1t^{r_1}+c_2t^{r_t}

where $\displaystyle r_{1,2}=-\frac{1}{2}\left[\left(\alpha-1\right)\pm\sqrt{\left(\alpha - 1\right)^2-4\beta}\right]$ .

Two Equal Roots, $\left(\alpha-1\right)^2-4\beta =0$

One solution that is obvious is

y_1\left(t\right)=t^{r_1}

where $\displaystyle r=\frac{1-\alpha}{2}$ .

To find the second independent solution, applying the differential operator

\displaystyle L=t^2\frac{\mathrm{d}^2}{\mathrm{d}t^2}+\alpha t\frac{\mathrm{d}}{\mathrm{d}t}+\beta

and rewrite the equation as $Ly=0$ . Plug in $y=t^r$ , we have

L\left[t^{r}\right]=\left(r-r_1\right)^2t^{r}

Taking partial derivatives of both sides with respect to $r$ gives

\frac{\partial}{\partial r}L\left[t^r\right]=L\left[\frac{\partial}{\partial r}t^{r}\right]=\frac{\partial}{\partial r}\left[\left(r-r_1\right)^2t^r\right]

That is

L\left[t^r\ln t\right]=\left(r-r_1\right)^2t^r\ln t+2\left(r-r_1\right)t^r

We can see that if $r=r_1$ , then the right-hand side will vanish. Therefore, the second solution is

y_2\left(t\right)=t^{r_1}\ln t

The general solution is

y=\left(c_1+c_2\ln t\right)t^{r_1}

Two Complex Solutions, $\left(\alpha-1\right)^2-4\beta <0$

Let

\begin{align*} r_1=\lambda+i\mu, && r_2=\lambda-i\mu \end{align*}

and

\phi\left(t\right)=t^{\lambda+i\mu}=t^{\lambda}\left[\cos\left(\mu \ln t\right)+i\sin\left(\mu \ln t\right)\right]

It can be shown that the real and imaginary parts are two independent solutions. Therefore,

\begin{align*} y_1\left(t\right)=\mathrm{Re}\left(\phi\left(t\right)\right)=t^{\lambda}\cos\left(\mu\ln t\right)\\ y_2\left(t\right)=\mathrm{Im}\left(\phi\left(t\right)\right)=t^{\lambda}\sin\left(\mu\ln t\right)\\ \end{align*}

and the general solution is

y\left(t\right)=t^{\lambda}\left[c_1\cos\left(\mu\ln t\right)+c_2\sin\left(\mu\ln t\right)\right]

where $\displaystyle \lambda=\frac{1-\alpha}{2}$ , $\displaystyle \mu = \frac{\sqrt{4\beta-\left(\alpha-1\right)^2}}{2}$ .

Method of Frobenius

Here, we want to find a class of singular differential equations that is more general than Euler’s equation. A very natural generalization is the following

L(y)=y''+p\left(t\right)y'+q\left(t\right)y=0

where $p\left(t\right)$ and $q\left(t\right)$ can be expanded in series of the form

\begin{align*} p\left(t\right)&=\frac{p_{0}}{t}+p_1+p_2t+p_3t^2+\cdots\\ q\left(t\right)&=\frac{q_{0}}{t^2}+\frac{q_{1}}{t}+q_2+q_3t+q_4t^2+\cdots \end{align*}

In this case, the equation is said to have a regular singular point at $t=0$ . Equivalently, an equation has a regular singular point at $t=0$ if functions $tp\left(t\right)$ and $t^2q\left(t\right)$ are analytic at $t=0$ . It has a regular singular point at $t=t_0$ if functions $\left(t-t_0\right)p\left(t\right)$ and $\left(t-t_0\right)^2q\left(t\right)$ are analytic at $t=t_0$ .

We focus on the interval $t>0$ and multiply through the equation by $t^2$ and get

t^2y''+t\left(tp\left(t\right)\right)y'+t^2q\left(t\right)=0

Compare the above equation with Euler’s equation, and find that it seems like adding higher order terms, i.e., by changing $\alpha$ to $\displaystyle tp\left(t\right)=\sum_{n=0}^{\infty}p_nt^n$ , and changing $\beta$ to $\displaystyle t^2q\left(t\right)=\sum_{n=0}^{\infty}q_nt^n$ . Therefore, we may be able to obtain a solution by plugging in higher order term to the original $t^r$ solution. Specifically, let

y\left(t\right)=\sum_{n=0}^{\infty}a_nt^{n+r}=t^{r}\sum_{n=0}^{\infty}a_nt^n

Then,

y'\left(t\right)=\sum_{n=0}^{\infty}\left(n+r\right)a_nt^{n+r-1}

and

y''\left(t\right)=\sum_{n=0}^{\infty}\left(n+r\right)\left(n+r-1\right)a_nt^{n+r-2}

Plug these in the original equation at get

\begin{align*} L(y)&=t^r\left\{\sum_{t=0}^{\infty}\left(n+r\right)\left(n+r-1\right)a_nt^n+\left(\sum_{m=0}^{\infty}p_mt^m\right)\left[\sum_{n=0}^{\infty}\left(n+r\right)a_nt^n\right]\right\}\\ &+t^r\left(\sum_{m=0}^{\infty}q_mt^m\right)\left(\sum_{n=0}^{\infty}a_nt^{n}\right)\\ &=\left[r\left(r-1\right)+0_0r+q_0\right]a_0t^r\\ &+\left\{\left[\left(1+r\right)r+p_0\left(1+r\right)+q_0\right]a_1+\left(rp_1+q_1\right)a_0\right\}t^\\ &+\dots\\ &+\left(\left[\left(n+1\right)\left(n+r-1\right)+p_0\left(n+r\right)+q_0\right]a_n+\sum_{n=0}^{\infty}\left[(k+r)p_{n-k}+q_{n-k}\right]a_k\right)t^{n+r} \end{align*}

If we set

F\left(r\right)=r\left(r-1\right)+p_0r+q_0

and set the coefficient of each like power equal to zero, we have an indicial equation

F\left(r\right)=r\left(r-1\right)+p_0r+q_0=0

and a recurrence relation

F\left(n+r\right)a_n=-\sum_{k=0}^{n-1}\left[\left(k+r\right)p_{n-k}+q_{m-k}\right]a_k

From these two relations, we can (hopefully) obtain two sets of coefficients $a_n(r_1)$ and $a_n(r_2)$ and the two independent solutions are

\begin{align*} y_1\left(t\right)=t^{r_1}\sum_{n=0}^{\infty}a_n\left(r_1\right)t^{n}, && y_2\left(t\right)=t^{r_2}\sum_{n=0}^{\infty}a_n\left(r_2\right)t^{n} \end{align*}

if we can obtain two different real solutions $r_1$ and $r_2$ , with $\left(r_1-r_2\right)\notin\mathbb{Z}$ .

Next, we discuss the possible situations that the above conditions are not satisfied.

Two Complex Roots

If $r$ is a complex root of the indicial equation, then

y\left(t\right)=t^r\sum_{n=0}^{\infty}a_nt^{n}

is a complex-valued solution, whose real and imaginary part are both real-valued solutions.

Equal Roots

If $r$ is the only root of the indicial equation, then we have only one independent solution.

y_1\left(t\right)=t^r\sum_{n=0}^{\infty}a_nt^{n}

To find the other solution, use the similar method of dealing with Euler’s equation. The second solution is

y_2\left(t\right)=y_1\left(t\right)\ln t+\sum_{n=0}^{\infty}a_n'\left(r\right)t^{n+r}

Roots Differing by a Positive Integer

Suppose that $r_2$ and $r_1=r_2+N$ , where $N\in\mathbb{Z}^{+}$ , are two roots of the indicial equation. Then, we can certainly find one solution

y_1\left(t\right)=t^r\sum_{n=0}^{\infty}a_n\left(r_1\right)t^{n}

It may be impossible to find another solution with coefficients $a_n\left(r_2\right)$ .

In this case, the equation will have a second solution of the form

y_2\left(t\right)=ay_1\left(t\right)\ln t+\sum_{n=0}^{\infty}a_n'\left(r_2\right)t^{n+r_2}

Elementary Complex Analysis

3 minute read

Published: November 16, 2023

Holomorphic Functions

A function $f:\mathbb{C}\to \mathbb{C}$ is complex differentiable, or holomorphic, at $z\in\mathbb{C}$ if

f'\left(z\right):=\lim_{h\to0,h\in\mathbb{C}}\frac{f\left(z+h\right)-f\left(z\right)}{h}

exists. The following are some of the properties of holomorphic functions.

Cauchy-Riemann Equations

Suppose that $f$ is holomorphic, and

f\left(x+iy\right)=u\left(x,y\right)+iv\left(x,y\right)

then, the functions $u,v:\mathbb{R}^2\to \mathbb{R}$ must satisfies the Cauchy-Riemann differential equations,

\begin{align*} \frac{\partial u}{\partial x}=\frac{\partial v}{\partial y}, && \frac{\partial u}{\partial y}=-\frac{\partial v}{\partial x} \end{align*}

In addition, we define

\begin{align*} \frac{\partial}{\partial z}:=\frac{1}{2}\left(\frac{\partial}{\partial x}+\frac{1}{i}\frac{\partial}{\partial y}\right), && \frac{\partial }{\partial \bar{z}}:=\frac{1}{2}\left(\frac{\partial}{\partial x}-\frac{1}{i}\frac{\partial}{\partial y}\right) \end{align*}

then

\begin{align*} f'\left(z\right)=\frac{\partial f}{\partial z}=2\frac{\partial u}{\partial z}, && \frac{\partial f}{\partial \bar{z}}=0 \end{align*}

Cauchy’s Theorem

Let $\Omega\sub\mathbb{C}$ be an open set, $f$ holomorphic on $\Omega$ and $\mathcal{C}\sub \Omega$ and oriented smooth curve. We define the integral of $f$ along $\mathcal{C}$ by

\int_{\mathcal{C}}f\left(z\right)\mathrm{d}z:=\int_{I}f\left(\gamma\left(t\right)\right)\cdot\gamma'\left(t\right)\mathrm{d}t

where $\gamma:I\to \mathcal{C}$ is a parametrization.

One of the most important properties of holomorphic function is known as Cauchy’s theorem.

If $f$ is holomorphic in a disc, then,

\oint_{\mathcal{C}}f\left(z\right)\mathrm{d}z=0

for any closed curve $\mathcal{C}$ in that disc.

With this properties, we can actually calculate many integrals along the real axis by extending it into an integration along a contour in the complex plane. The situation that we integrate over a semi-circle of radius $R$ and want that part to vanish is quite common, and Jordan’s lemma present a useful general result.

Assume that for some $R_0>0$ the function $g:\mathbb{C}\backslash\overline{B_{R_0}(0)}\to \mathbb{C}$ is holomorphic. Let

f\left(z\right)=e^{iaz}g\left(z\right)

for some $a>0$ . Let

\mathcal{C}_{R}=\left\{z\in\mathbb{C}:z=Re^{i\theta},0\le\theta\le\pi\right\}

be a semi-circle segment in the upper half-plane and assume that

\max_{0\le\theta\le\pi}\left|g\left(Re^{i\theta}\right)\right|\to0

as $R\to \infty$ . Then

\lim_{R\to\infty}\int_{\mathcal{C}_R}f\left(z\right)\mathrm{d}z=0

Cauchy’s Integral Formula

Suppose $f$ is a holomorphic function in an open set $\Omega\sub\mathbb{C}$ . If a closed disc $D$ defined as

D\:=\left\{z:\left|z-z_0\right|\le r\right\}

is completely contained in $\Omega$ . Let $C$ be the positively oriented circle forming the boundary of $D$ , i.e., $C=\partial D$ . Then, for all $a\in D$

\oint_{C}\frac{f\left(z\right)}{z-a}\mathrm{d}z=2\pi if\left(a\right)

This formula is known as the Cauchy’s integral formula. Furthermore, $f$ has infinitely many complex derivatives in $\Omega$ , and for all $z\in D$ ,

f^{\left(n\right)}\left(z\right)=\frac{n!}{2\pi i}\oint_{C}\frac{f\left(\zeta\right)}{\left(\zeta-z\right)^{n+1}}\mathrm{d}\zeta

It then follows that holomorphic functions are analytic, specifically, for all $z\in D$

f\left(z\right)=\sum_{n=0}^{\infty}a_n\left(z-z_0\right)^n

where

a_n=\frac{f^{\left(n\right)}\left(z_0\right)}{n!}

Residue Calculus

If $f:\Omega\to\mathbb{C}$ has a pole of order $n$ at $z_0\in \Omega$ , then there exists a neighborhood $U\sub\Omega$ of $z_0$ , numbers $a_{-n},\dots, a_{-1}\in\mathbb{C}$ and a holomorphic function $G: U\to \mathbb{C}$ such that

f\left(z\right)=\frac{a_{-n}}{\left(z-z_0\right)^n}+\frac{a_{-n+1}}{\left(z-z_0\right)^{n-1}}+\cdots+\frac{a_{-1}}{z-z_0}+G\left(z\right)

for all $z\in U$ . The term

P\left(z\right):=\frac{a_{-n}}{\left(z-z_0\right)^n}+\frac{a_{-n+1}}{\left(z-z_0\right)^{n-1}}+\cdots+\frac{a_{-1}}{z-z_0}

is called the principal part of $f$ at the pole $z_0$ . To calculate the coefficients of the principal part, we have

a_{-n}=\lim_{z\to z_0}\left(z-z_0\right)^nf\left(z\right)

The coefficient $a_{-1}$ of $\left(z-z_0\right)^{-1}$ is called the residue of $f$ at the pole $z_0$ , denoted as

\mathrm{res}_{z_0}f=a_{-1}

In fact, we can calculate the residue without knowing the series expansion. In the case where $f$ has a simple pole at $z_0$ , it is clear that

\mathrm{res}_{z_0}f=\lim_{z\to z_0}\left(z-z_0\right)f\left(z\right)

If $f$ has a pole of order $n$ at $z_0$ , then

\mathrm{res}_{z_0}f=\frac{1}{\left(n-1\right)!}\lim_{z\to z_0}\frac{\mathrm{d}^{n-1}}{\mathrm{d}z^{n-1}}\left(\left(z-z_0\right)^nf\left(z\right)\right)

Residues are very useful at calculating contour integration in the complex plane.

Suppose that $f$ is holomorphic in an open set containing a (positively oriented) toy contour $\mathcal{C}$ and its interior, except for poles at points $z_1,\dots, z_n$ inside $\mathcal{C}$ . Then,

\oint_{\mathcal{C}}f\left(z\right)\mathrm{d}z=2\pi i\sum_{k=1}^{n}\mathrm{res}_{z_k}f

In addition, let $p$ and $q$ be polynomials of degree $m$ and $n$ respectively, where $n\ge m+2$ . If $q\left(x\right)\ne 0$ for $x>0$ , if $q$ has a zero of order at most 1 at the orgin, and if

f\left(z\right)=\frac{z^{\alpha}p\left(z\right)}{q\left(z\right)}

for $0<\alpha<1$ , then

\int_{0}^{\infty}\frac{z^{\alpha}p\left(x\right)}{q\left(x\right)}\mathrm{d}x=\frac{2\pi i}{1-e^{2\pi \alpha i}}\sum_{k=1}^{n}\mathrm{res}_{z_k}f

where $z_1,\dots, z_n$ are the non zero poles of $p/q$ .

Laplace Transform and Inverse

3 minute read

Published: November 07, 2023

The Laplace transform was first introduced by Pierre-Simon Laplace in his work on probability theory. The transform turns out to have many applications in science and engineering, especially in solving linear differential equations.

The Laplace Transform

Let $f(t)$ be defined for $0\leq t<\infty$ . The Laplace transform of $f(t)$ , which is denoted by $F(s)$ , or $\mathcal{L}\left\{f(t)\right\}$ is given by

\begin{equation} F\left(s\right)=\mathcal{L}\left\{f\left(t\right)\right\}=\int_{0}^{\infty}e^{-st}f\left(t\right)\mathrm{d}t \end{equation}

where the improper integral is understood as

\int_{0}^{\infty}e^{-st}f\left(t\right)\mathrm{d}t=\lim_{A\to \infty}\int_{0}^{A}e^{-st}f\left(t\right)\mathrm{d}t

Remarks:

Two conditions that $f$ should be satisfied to have a Laplace transform (for sufficiently large $s$ ) are

The function $f(t)$ is of exponential order, i.e., there exist constants $M$ and $c$ such that for all $0\le t<\infty$ ,

\left|f\left(t\right)\right|\le Me^{ct}

The function $f(t)$ is piecewise continuous, i.e., $f(t)$ has at most a finite number of discontinuities on any interval $0\le t\le A$ , and both the limit from the right and the limit from the left of $f$ exist at every point of discontinuity.

The Laplace transform doesn’t care how the function $f(t)$ behaves when $t<0$ , this means that different functions can have the same Laplace transform as long as they agree on $t\ge 0$ .

Properties

In the following, we will assume $\mathcal{L}\left\{f(t)\right\}=F(s)$ , $\mathcal{L}\left\{g(t)\right\}=G(s)$ , or denotes as $f(t)\rightsquigarrow F(s)$ , $g(t)\rightsquigarrow G(s)$ . We have the following important properties.

Linearity

\mathcal{L}\left\{\alpha f(t)+\beta g(t)\right\}=\alpha F(s)+\beta G(s)

Frequency domain derivatives

tf\left(t\right)\rightsquigarrow-F'\left(s\right)

t^{n}f\left(t\right) \rightsquigarrow \left(-1\right)^{n}F^{\left(n\right)}\left(s\right)

Time domain derivatives

f'\left(t\right)\rightsquigarrow sF(s)-f(0)

f''(t)\rightsquigarrow s^2F(s)-sf(0)-f'(0)

f^{\left(n\right)}(t)\rightsquigarrow s^{n}F(s)-\sum_{k=1}^{n}s^{n-k}f^{\left(k-1\right)}\left(0\right)

Frequency shifting

e^{at}f(t)\rightsquigarrow F\left(s-a\right)

Time shifting

u(t-a)f(t-a)\rightsquigarrow e^{-as}F(s)

f(t)u(t-a)\rightsquigarrow e^{-as}\mathcal{L}\left(f(t+a)\right)

where $u(t)$ is the Heaviside step function (unit step function), and $a>0$ .

Time scaling

f(at)\rightsquigarrow \frac{1}{a}F\left(\frac{s}{a}\right)

where $a>0$ .

Convolution

The convolution between two functions is defined as

(f*g)\left(t\right)=\int_{0}^{t}f(\tau)g(t-\tau)\mathrm{d}\tau

Then, the Laplace transform of the convolution is the ordinary product of the Laplace transforms,

(f*g)\left(t\right)\rightsquigarrow F(s)G(s)

Laplace Transform of Common Functions

The following lists the Laplace transform of some common functions:

Dirac Delta function (unit impulse)

\delta\left(t\right)\rightsquigarrow 1

\delta\left(t-a\right)\rightsquigarrow e^{-as}

Heaviside step function (unit step)

u(t)\rightsquigarrow \frac{1}{s}

u(t-a)\rightsquigarrow \frac{1}{s}e^{-as}

Polynomial

t\rightsquigarrow \frac{1}{s^2}

For integers $n$ , the following two formulas hold

t^{n}\rightsquigarrow \frac{n!}{s^{n+1}}

\sqrt[n]{t}\rightsquigarrow \frac{1}{s^{\frac{1}{n}+1}}\Gamma\left(\frac{1}{n}+1\right)

where

\Gamma\left(z\right)=\int_{0}^{\infty}t^{z-1}e^{-t}\mathrm{d}t

is the Euler’s Gamma function, with $\Gamma(1/2)=\pi$ .

Exponential decay

e^{-at}\rightsquigarrow \frac{1}{s+a}

Sine and Cosine

\sin\left(\omega t\right)\rightsquigarrow \frac{\omega}{s^2+\omega^2}

\cos\left(\omega t\right)\rightsquigarrow \frac{s}{s^2+\omega^2}

Hyperbolic Sine and Cosine

\sinh\left(\omega t\right)\rightsquigarrow \frac{\omega}{s^2-\omega^2}

\cosh\left(\omega t\right)\rightsquigarrow \frac{s}{s^2-\omega^2}

Inverse Laplace Transform

The inversion of the Laplace Transform is given by the Mellin inversion formula. In particular, if $f$ is continuous on $\left[0,\infty\right)$ , continuously differentiable on $\left(0,\infty\right)$ , and satisfies the condition for the Laplace transform to exist, then

f\left(s\right)=\left[\mathcal{M}\left(\mathcal{L}f\right)\right]\left(s\right)

for all $s\in\left[0,\infty\right)$ , where

\left(\mathcal{M}F\right)\left(t\right)=\frac{1}{2\pi i}\int_{\mathcal{C}^*}e^{pt}F\left(p\right)\mathrm{d}p=\frac{1}{2\pi i}\int_{\beta-i\infty}^{\beta+i\infty}e^{pt}F\left(p\right)\mathrm{d}p

is called the Bromwich integral of $F$ , and the (positively oriented) contour is defined as

\mathcal{C}=\left\{z\in\mathbb{C}: \mathrm{Re}z=\beta\right\}

where $\beta\in\R$ is any number such that $F$ is analytic for all $z\in\mathbb{C}$ with $\mathrm{Re}z\ge\beta$ .

To calculate the Bromwich integral, we should consider the case when $t\ge0$ and $t<0$ separately.

For $t\ge 0$ , the integration contour closed on the left.

Let $\Gamma$ be the straight line and $\mathcal{C}_{\beta,R}$ be the semicircle (orientation indicated by the arrows). Then, by the residue theorem

\int_{\Gamma}e^{pt}F\left(p\right)\mathrm{d}p+\int_{\mathcal{C}_{\beta,R}}e^{pt}F\left(p\right)\mathrm{d}p=2\pi i\sum_{k=1}^{N}\mathrm{res}_{z_k}F

Letting $R\to \infty$ , the first integral be the Bromwich integral we want. For the second integral, by applying the substitution rule,

\int_{\mathcal{C}_{\beta,R}}e^{pt}F\left(p\right)\mathrm{d}p=ie^{\beta t}\int_{\mathcal{C}_R}e^{itp}F\left(\beta+ip\right)\mathrm{d}p

where $\mathcal{C}_R$ is a semi-circle of radius $R$ in the upper half-plane. In most cases, we can use Jordan’s Lemma to show that the right-hand side goes to 0 as $R\to \infty$ .

Then, the Bromwich integral of $F$ is just the sum of all the residues of $F$ ,

\frac{1}{2\pi i}\int_{\beta-i\infty}^{\beta+i\infty}e^{pt}F\left(p\right)\mathrm{d}p=\sum_{k=1}^{N}\mathrm{res}_{z_k}F

For $t<0$ , the integration contour is closed on the left,

Let $\Gamma$ be the straight line and $\mathcal{C}_{\beta,R}^{(2)}$ be the semi-circle (orientation indicated by the arrows). Using a similar argument, we see that

\int_{\mathcal{C}_{\beta,R}^{(2)}}e^{pt}F\left(p\right)\mathrm{d}p=ie^{\beta t}\int_{\mathcal{C}_R}e^{-itp}F\left(\beta-ip\right)\mathrm{d}p

Since the integral over the whole closed contour is zero, and the orientation of the straight line is in the opposite direction, the Bromwich integral is

\frac{1}{2\pi i}\int_{\beta-i\infty}^{\beta+i\infty}e^{pt}F\left(p\right)\mathrm{d}p=\frac{1}{2\pi i}\lim_{R\to\infty}\int_{\mathcal{C}_{\beta,R}^{(2)}}e^{pt}F\left(p\right)\mathrm{d}p

which in most cases vanishes according to Jordan’s Lemma.

Reference

[1]. VV286 Lecture Slides, Horst Hohberger, UM-SJTU JI.

Linear System of ODEs Theory

8 minute read

Published: October 25, 2023

An important kind of equation to learn is those linear equations, which are those can be written in the form of

Ly=b

for some linear differential operator $L$ and unknown $y$ . We begin by the example of second order homogeneous linear equations.

Second Order Linear Equations

Given a linear, second order, homogeneous ODE

\begin{equation}y''+p\left(t\right)y'+q\left(t\right)y=0\end{equation}

Now, we are going to show the following two things:

If $y_1,y_2$ are solution to (1), then $y=c_1y_1+c_2y_2$ is also a solution to (1) for some constants $c_1,c_2\in \mathbb{R}$ .

If $y_1$ and $y_2$ are two linearly independent solutions to (1), then all solution to (1) have the form $y=c_1y_1+c_2y_2$ for some constants $c_1,c_2\in\mathbb{R}$ .

The first one relies on the linearity of differential operator

L=\frac{\mathrm{d}^2}{\mathrm{d}t}+p\left(t\right)\frac{\mathrm{d}}{\mathrm{d}t}+q\left(t\right)

We can show that this operator is linear by observing that

$L\left(cy\right)=cL\left(y\right)$ for any constant $c$ .

$L\left(y_1+y_2\right)=L\left(y_1\right)+L\left(y_2\right)$ .

Then, for any constant $c_1,c_2$ , we have

L\left(c_1y_1+c_2y_2\right)=c_1L\left(y_1\right)+c_2L\left(y_2\right)=0

and the first one is proved.

The second one relies the Picard-Lindelof theorem also known as existence and uniqueness theorem:

Let $x_0\in\Omega$ , where $\Omega\subset\mathbb{R}^n$ is open and let $t_0\in I$ , where $I\subset\mathbb{R}$ be an interval.

Suppose $F:\Omega\times I\to\mathbb{R}^n$ is continuous in $t$ and Lipschitz continuous in $x$ , i.e. there exist a constant $L>0$ , such that for all $x,y\in\Omega$ and all $t\in I$ ,

\left\|F\left(x,t\right)-F\left(y,t\right)\right\|\le L\left\|x-y\right\|

Then, the initial value problem

\begin{align}x'=F\left(x,t\right) &&x\left(t_0\right)=x_0\end{align}

has one and only one solution in some $t$ interval containing $t_0$ .

An outline of the proof is as follows:

We can always construct a solution by using an iterative approximation method known as Picard iteration, which works as follows:

First, a solution to (2) will have to satisfy the integral equation

x\left(t\right)=x_0+\int_{t_0}^tF\left(x\left(s\right),s\right)\mathrm{d}s

Then, by setting $x^{(0)}\left(t\right)=x_0$ , and

x^{\left(k+1\right)}\left(t\right)=x_0+\int_{t_0}^tF\left(x^{\left(k\right)}\left(s\right),s\right)\mathrm{d}s

yields a sequence of function $x^{(k)}\left(t\right)$ . Then, under some condition on function $F$ , we can use Contraction Mapping Principle to show that the sequence converges to a unique function $x\left(t\right)$ .

Algebraic Structure of Linear Equation

An interesting question to consider in the proof is that why is there exactly two independent solutions to a second linear order equation? A simple answer to this might be that since it contain second order derivative, when we solve it we shall integrate twice and each gives an integrating constant $c$ which makes there are two constants $c_1,c_2$ to determine.

In fact, if we apply some knowledge from linear algebra, we may easy to think of the solutions as a vector space, known as solution space which happens to be 2 dimensional. And the general case would be, for a linear equation of order $n$ , the dimension of its solution space is exactly $n$ . Now, we want to prove this.

First notice that given an explicit ODE of order $n$ ,

\begin{equation}x^{\left(n\right)}\left(t\right)=f\left(x,x',x'',\dots,x^{\left(n-1\right)},t\right)\end{equation}

and set new variables

\begin{align*} x_1=x,&& x_2=x',&& x_3=x'',&& \cdots,&& x_n=x^{\left(n-1\right)} \end{align*}

Equation (3) can be written as a linear system of equations of order 1,

\left( \begin{array}{c} x_1'\\ x_2'\\ \vdots\\ x_n'\\\end{array} \right) =\left( \begin{array}{c} x_2\\ x_3\\ \vdots\\ f\left( x_1,x_2,\dots ,x_n,t \right)\\\end{array} \right)

or, simply denoted as

\boldsymbol{x}'=\boldsymbol{F}\left(\boldsymbol{x},t\right)

Therefore, for any linear equation of order $n$ , it is always equivalent to a system of $n$ first order linear equation.

Thence, we only need to consider the first order linear system of equations,

\boldsymbol{x}'=\boldsymbol{A}\left(t\right)\boldsymbol{x}+\boldsymbol{b}\left(t\right)

where $\boldsymbol{b}:\mathbb{R}\to\mathbb{R}^n$ , and $\boldsymbol{A}:\mathbb{R}\to\mathbb{R}^n\times\mathbb{R}^n$ .

Consider the associated homogeneous equation

\begin{equation}\boldsymbol{x}'=\boldsymbol{A}\left(t\right)\boldsymbol{x}\end{equation}

Notice that given any two solution $\boldsymbol{x}_1,\boldsymbol{x}_2$ , and any number $\lambda,\mu\in\mathbb{R}$ , their linear combination $\lambda\boldsymbol{x}_1+\mu\boldsymbol{x}_2$ is also a solution.

Suppose that a basis of vectors $\mathcal{B}=\left\{\boldsymbol{b}_1,\cdots,\boldsymbol{b}_n\right\}$ , of $\mathbb{R}^n$ is given and fix $t_0\in \mathbb{R}$ . Denote by $\boldsymbol{x}_k\left(t\right)$ the unique solution (theorem of Picard-Lindelof) to the initial value problem

\begin{align} \boldsymbol{x}_k'=\boldsymbol{A}\left(t\right)\boldsymbol{x}_k, && \boldsymbol{x}_k\left(t_0\right)=\boldsymbol{b}_k \end{align}

and denote

\mathcal{F}=\left\{\boldsymbol{x}_1,\cdots,\boldsymbol{x}_n\right\}

Then, by superposition principle, any element of $\mathrm{span}\mathcal{F}$ is a solution to (4)

Conversely, let $\boldsymbol{x}$ be any solution to (4), and suppose that $\boldsymbol{x}\left(t_0\right)=\boldsymbol{x}_0\in\mathbb{R}^n$ . We have

\boldsymbol{x}\left(t_0\right)=\boldsymbol{x}_0=\sum_{k=1}^{n}\lambda_k\boldsymbol{b}_k=\sum_{k=1}^{n}\lambda_k\boldsymbol{x}_k\left(t_0\right)

for some numbers $\lambda_1,\dots,\lambda_n\in\mathbb{R}$ . Therefore, by theorem of Picard-Lindelof,

\boldsymbol{x}=\sum_{k=1}^n\lambda_k\boldsymbol{x}_k

This shows that the solution space is given by $\mathrm{span}\mathcal{F}$ .

In addition, we have,

\forall t, \sum_{k=1}^n{\lambda _k}\boldsymbol{x}_k\left( t \right) =0\Rightarrow \sum_{k=1}^n{\lambda _k}\boldsymbol{x}_k\left( t_0 \right) =\sum_{k=1}^n{\lambda _k}\boldsymbol{b}_k=0\Rightarrow \lambda _k=0

which means $\mathcal{F}$ is a basis.

Now, let’s explain why a second order linear equation will have exactly two independent solutions. For equation (1), let $x_1=y,x_2=y'$ , and denote $\boldsymbol{x}=\left(x_1,x_2\right)$ , it is equivalent to

\boldsymbol{x}'=\left( \begin{matrix} 0& 1\\ -p\left( t \right)& -q\left( t \right)\\\end{matrix} \right) \boldsymbol{x}+\left( \begin{array}{c} 0\\ g\left( t \right)\\\end{array} \right)

which is a first order linear system of equation in two dimension.

Solution to Linear System of Equations

Homogeneous, Constant Coefficients

First, consider the most basic case of linear system of equations, that is a homogeneous one with constant coefficients:

\begin{align}\boldsymbol{x}'=\boldsymbol{A}\boldsymbol{x},&&\boldsymbol{x}\left(0\right)=\boldsymbol{x}_0 \end{align}

where $\boldsymbol{x}:\mathbb{R}\to\mathbb{R}^n$ and $\boldsymbol{A}\in \mathcal{M}\left(n\times n;\mathbb{R}\right)$ .

With proper definition, it can be shown that the solution to the initial value problem (6) is

\boldsymbol{x}\left(t\right)=e^{\boldsymbol{A}t}\boldsymbol{x}_0

where

e^{\boldsymbol{A}t}=\boldsymbol{I}+\sum_{k=1}^{\infty}{\frac{\boldsymbol{A}^kt^k}{k!}}

Now, the problem is how to calculate this matrix exponential. Recall from linear algebra that diagonal matrix, or Jordan matrix, are relative easy to calculate the matrix exponential. Fortunately, we have the Jordan Normal Form theorem which states that every complex matrix is similar to a Jordan matrix, i.e., for any matrix $\boldsymbol{A}$ , there exists a matrix $\boldsymbol{U}$ and a Jordan matrix $\boldsymbol{J}$ such that

\boldsymbol{A}=\boldsymbol{U}\boldsymbol{J}\boldsymbol{U}^{-1}

Then, we can easily see that

\boldsymbol{A}^{k}=\boldsymbol{U}\boldsymbol{J}^k\boldsymbol{U}^{-1}

and then

e^{\boldsymbol{A}t}=\boldsymbol{I}+\sum_{k=1}^{\infty}{\frac{\boldsymbol{A}^kt^k}{k!}}=\boldsymbol{UIU}^{-1}+\boldsymbol{U}\sum_{k1}^{\infty}{\frac{\boldsymbol{J}^kt^k}{k!}}\boldsymbol{U}^{-1}=\boldsymbol{U}e^{\boldsymbol{J}t}\boldsymbol{U}^{-1}

The problem now becomes calculating $\displaystyle e^{\boldsymbol{J}t}$ .

Let

\boldsymbol{J}=\left( \begin{matrix} \boldsymbol{J}_{k_1}\left( \lambda _1 \right)& & 0\\ & \ddots& \\ 0& & \boldsymbol{J}_{k_m}\left( \lambda _m \right)\\\end{matrix} \right)

be a Jordan matrix, where $\boldsymbol{J}_{k_i}\left(\lambda_i\right)$ is a Jordan block of size $k_i$ and diagonal element $\lambda_i$ , i.e.,

\boldsymbol{J}_{k_i}\left( \lambda _i \right) =\left( \begin{matrix} \lambda _i& 1& 0\\ & \ddots& 1\\ 0& & \lambda _i\\ \end{matrix} \right)

It’s easy to show that

e^{\boldsymbol{J}t}=\left( \begin{matrix} e^{\boldsymbol{J}_{k_1}\left( \lambda _1 \right)}& & 0\\ & \ddots& \\ 0& & e^{\boldsymbol{J}_{k_m}\left( \lambda _m \right)}\\\end{matrix} \right)

Hence, it’s sufficient to calculate the exponential of Jordan blocks. Notice that

\boldsymbol{J}_k\left(\lambda\right)=\lambda\boldsymbol{I}+\boldsymbol{N}

where $\boldsymbol{N}$ is some nilpotent matrix whose power will vanish after finite many times, and that unit matrix commutes with any other matrix in terms of multiplication, i.e., $\boldsymbol{I}\boldsymbol{N}=\boldsymbol{N}\boldsymbol{I}$ . Therefore,

e^{\boldsymbol{J}_k\left( \lambda \right) t}=e^{\mathrm{diag}\left( \lambda \right) t}e^{Nt}=\left( \begin{matrix} e^{\lambda t}& & 0\\ & \ddots& \\ 0& & e^{\lambda t}\\ \end{matrix} \right) \left( \boldsymbol{I}+\sum_{k=1}^p{\frac{\boldsymbol{N}^kt^k}{k!}} \right)

However, the above calculation seems too complicated that it would be impractical. If we trace back what we are actually doing, we know that we are actually finding a basis of the solution space.

From previous discussion of the structure of the solution, we know immediately that for any basis $\mathcal{B}=\left\{\boldsymbol{b}_1,\cdots,\boldsymbol{b}_n\right\}$ of $\mathbb{R}^n$ , the system of functions $\mathcal{F}=\left\{e^{\boldsymbol{A}t}\boldsymbol{b}_1,\cdots,e^{\boldsymbol{A}t}\boldsymbol{b}_n\right\}$ is a fundamental system of solution. We collect them into one matrix and denoted as $\boldsymbol{X}$ , and is also called the fundamental matrix.

It turns out that if we choose the basis wisely, we don’t even need to calculate the whole $e^{\boldsymbol{A}t}$ to get a fundamental matrix. From theory of Jordan normal form, we know that there exists a basis consists of (generalized) eigenvectors $\boldsymbol{U}=\left\{\boldsymbol{u}_1,\cdots, \boldsymbol{u}_n\right\}$ , such that the Jordan matrix of $\boldsymbol{A}$ is given by $\boldsymbol{J}=\boldsymbol{U}^{-1}\boldsymbol{AU}$ . Using this basis, we have the fundamental matrix simply given by

\boldsymbol{X}\left(t\right)=\boldsymbol{U}e^{\boldsymbol{J}t}

The columns of $\boldsymbol{X}$ will therefore be the fundamental system of solutions.

Inhomogeneous, Constant Coefficients

Next, we solve an inhomogeneous equation with constant coefficients. The general solution is the superposition of a particular solution and the solution to the associated homogeneous equation.

Consider the following linear system of differential equation

\boldsymbol{x}'=\boldsymbol{Ax}+\boldsymbol{b}\left(t\right)

suppose $\mathcal{F}=\left\{\boldsymbol{x}_1,\cdots,\boldsymbol{x}_n\right\}$ is a fundamental system for the associated homogeneous equation, then the general solution is given by

\boldsymbol{x}\left(t;c_1,\cdots,c_n\right)=\sum_{k=1}^{n}c_k\boldsymbol{x}_k\left(t\right)+e^{\boldsymbol{A}t}\int e^{-\boldsymbol{A}t}\boldsymbol{b}\left(t\right)\mathrm{d}t

Variable Coefficients

There is no general method of solving homogeneous equations with variable coefficients,

\boldsymbol{x}'=\boldsymbol{A}\left(t\right)\boldsymbol{x}

However, if we are given a fundamental system of solution $\mathcal{F}=\left\{\boldsymbol{x}_1,\cdots,\boldsymbol{x}_n\right\}$ , then we can find a particular solution to the inhomogeneous equation

\begin{equation}\boldsymbol{x}'=\boldsymbol{A}\left(t\right)\boldsymbol{x}+\boldsymbol{b}\left(t\right)\end{equation}

by method of variation of parameter.

Let

\boldsymbol{x}_p=\sum_{k=1}^{n}c_k\left(t\right)\boldsymbol{x}_k\left(t\right)

and plug into equation (7), and get

\frac{\mathrm{d}}{\mathrm{d}t}\boldsymbol{x}_p=\sum_{k=1}^n{\left( c_k'\left( t \right) \boldsymbol{x}_k\left( t \right) +c_k\left( t \right) \boldsymbol{x}_k'\left( t \right) \right)}=\boldsymbol{A}\left( t \right) \boldsymbol{x}_p+\boldsymbol{b}\left( t \right)

leading to

\sum_{k=1}^n{c_k'\left( t \right) \boldsymbol{x}_k\left( t \right)}=\boldsymbol{b}\left( t \right)

Let $\boldsymbol{X}\left(t\right)=\left(\boldsymbol{x}_1\left(t\right),\cdots,\boldsymbol{x}_n\left(t\right)\right)$ be the fundamental matrix, and $\displaystyle \boldsymbol{c}\left( t \right) =\left( \begin{array}{c} c_1\left( t \right)\\ c_2\left( t \right)\\ \vdots\\ c_n\left( t \right)\\ \end{array} \right)$ then the previous equation is indeed the following linear system of equation

\boldsymbol{X}\boldsymbol{c}'=\boldsymbol{b}

Using Cramer’s rule, we can find the solution to $c_k'\left(t\right)$ .

c_k'\left( t \right) =\frac{\det \boldsymbol{X}^{\left( k \right)}\left( t \right)}{\det \boldsymbol{X}\left( t \right)}=\frac{W^{\left( k \right)}\left( t \right)}{W\left( t \right)}

where

W\left(t\right)=\det \left(\boldsymbol{x}_1\left(t\right),\cdots,\boldsymbol{x}_n\left(t\right)\right)

is called the Wronskian.

Solving First and Second Order ODEs

10 minute read

Published: October 23, 2023

First Order ODEs

First order ODE is equations that has the highest derivative of order one. It takes the form

\begin{equation}M\left(x,y\right)+N\left(x,y\right)y'=0\end{equation}

The general solution to this equation is some function $y\left(x\right)$ that satisfies the equation.

Separable Equation

A separable equation is of the form

y'=f\left(x\right)g\left(y\right)

Linear Equation

A linear equation is of the form

a_1\left(x\right)y'+a_2\left(x\right)y=a_3\left(x\right)

For now, we assume $a_1\left(x\right)\ne0$ for all $x\in I$ , then the equation above is simplified into

y'+p\left(x\right)y=q\left(x\right)

If $q\left(x\right)=0$ for all $x\in I$ , it is called a homogeneous equation;

Otherwise, it is called an inhomogeneous equation.

Homogeneous

A homogeneous linear equation is of the form

y'+p\left(x\right)y=0

which is a separable equation. The general solution to it is

y\left(x;C\right)=C\cdot\exp\left(-\int p\left(x\right)\mathrm{d}x\right)

If given an initial condition $y\left(\xi\right)=\eta$ , then the solution is

y\left(x\right)=\eta\cdot\exp\left(-\int_{\xi}^{x}p\left(t\right)\mathrm{d}t\right)

Inhomogeneous

An inhomogeneous linear equation is of the form

y'+p\left(x\right)y=q\left(x\right)

Let

u\left(x\right)=\exp\left(\int p\left(x\right)\mathrm{d}x\right)

and multiply $u\left(x\right)$ , which is called the integrating factor, to both sides of the equation, we get $\left(uy\right)'=uq$ . Therefore the solution is

y\left(x;C\right)=\frac{1}{u\left(x\right)}\left(\int u\left(x\right)q\left(x\right)\mathrm{d}x+C\right)

If given initial condition $y\left(\xi\right)=\eta$ , then we can find a specific constant $C$ .

Method of substitution

In addition to separable equations and linear equations for which we have a direct solving techniques, some other first order equations can be transformed into separable equations or linear equations by method of substitution. The following are some types of those transformable equations.

Equation: $y'=f\left(ax+by+c\right)$

Consider a equation of the form

y'=f\left(ax+by+c\right)

where $a,b,c\in \mathbb{R}$ , $b\ne 0$ .

Let $u=ax+by+c$ , then,

u'=a+by'=a+bf\left(u\right)

which is a separable equation of $u$ .

Homogeneous Equation

A homogeneous equation has the form

y'=f\left(\frac{y}{x}\right)

This type of equation has an important characteristic that its invariant under zoom, i.e., if make change $x\to ax$ , $y\to ay$ , the equation does not change.

Tip:

Sometimes the equation is put into a strange form that it may not be obvious the right hand side is a function of $y/x$ . A quick judge would be to check whether the sum of power of $x$ and $y$ in the denominator and numerator of the right hand side are equal.

We solve this kind of equations by transforming them into a separable equation.

First, let $\displaystyle z=\frac{y}{x}$ , and therefore, $y=zx$ , and $y'=z'x+z$ . Substitute that in the equation and,

z'x+z-f\left(z\right)=0

which is a separable equation. Here are some examples,

Example
Find the general solution to the following differential equation
$y'=\frac{y+x}{x-y}$
Solution:
First, make the left hand side life a homogeneous equation,
$y'=\frac{y/x+1}{1-y/x}$
and let $z=y/x$ . Then, the equation becomes,
$xz'=\frac{z^2+1}{1-z}$
integrate both side
$\int \frac{1}{x}\mathrm{d}x=\int\frac{1-z}{z^2+1}\mathrm{d}z$
which is
$\ln x +C=\arctan z-\frac{1}{2}\ln\left(z^2+1\right)$
Change back into $y$ , and get
$\arctan \left(\frac{y}{x}\right)=\ln\sqrt{x^2+y^2}+C$
Now, suppose we are in the polar coordinate, where $\displaystyle \theta=\arctan y/x$ , and $r=x^2+y^2$ ,
$\theta=\ln r+C$
or equivalently,
$r=C_1e^{\theta}$
which is known as the exponential spiral.

Bernoulli’s Equation

A Bernoulli’s equation has the form

y'=p\left(x\right)y+q\left(x\right)y^n

where $n\ne 0,1$ . We solve this kind of equations by transforming them into a linear equation.

First, multiply the equation by $\left(1-n\right)y^{-n}$ , and get

\left(1-n\right)\frac{y'}{y^n}=\frac{\left(1-n\right)p\left(x\right)}{y^{n-1}}+\left(1-n\right)q\left(x\right)

Next, let $\displaystyle v=y^{1-n}$ , and observe that $\displaystyle v'=\left(1-n\right)\frac{y'}{y^n}$ , thus the equation becomes,

v'=\left(1-n\right)p\left(x\right)v+\left(1-n\right)q\left(x\right)

which is a linear equation. Here is an example.

Example
Find the general solution of the following equation
$y'+\frac{y}{1+x}+\left(1+x\right)y^4=0$
Solution:
First, divide the equation by $y^4$ , which gives
$\frac{y'}{y^4}+\frac{1}{1+x}\cdot\frac{1}{y^3}+\left(1+x\right)=0$
Then, the substitution should be $v=y^{-3}$ , and $v'=-3y^{-4}y'$ . Thus, the equation becomes
$v'-\frac{3}{1+x}v-3\left(1+x\right)=0$
which is a linear equation.
Next, the integrating factor is
$u\left(x\right)=\exp\left(-\int\frac{3}{1+x}\mathrm{d}x\right)=\frac{1}{\left(1+x\right)^3}$
Multiply the last equation by $u$ gives
$\frac{v'}{\left(1+x\right)^3}-\frac{3}{\left(1+x\right)^4}v=3\left(1+x\right)$
which is in fact,
$\left(\frac{1}{\left(1+x\right)^3}v\right)'=3\left(1+x\right)$
Therefore,
$v=\left(1+x\right)^2\left(C+Cx-3\right)$
and substitute back to get
$y=\frac{\mathrm{sgn}\left(C+Cx-3\right)}{\sqrt[3]{\left(1+x\right)^2\left|C+Cx-3\right|}}$

Integral Curves

Even with the previous cases, there are not very much equations that we can solve. Now, we generalize the solution of a differential by not requiring it to be a explicit function. That is, we can solve (1) if we can found a function $\phi\left(x,y\right)$ , such that (1) can be transformed into the form

\begin{equation}\frac{\mathrm{d}}{\mathrm{d}x}\phi\left(x,y\right)=0\end{equation}

Then, we integrate it with respect to $x$ and get a general solution

\phi\left(x,y\right)=c

for some constant $c$ . This kind of solution is know as the integral curve.

Now, we discuss when can we write an equation into (2). By chain rule of partial differentiation, we have

\frac{\mathrm{d}}{\mathrm{d}x}\phi \left( x,y\left( x \right) \right) =\frac{\partial \phi}{\partial x}+\frac{\partial \phi}{\partial y}\frac{\mathrm{d}y}{\mathrm{d}x}

compare this with equation (1), we know that for such a nice function $\phi$ to exist, we require

\begin{align*} M\left( x,y \right) =\frac{\partial \phi}{\partial x}&& N\left( x,y \right) =\frac{\partial \phi}{\partial y}

Guanyu Xu

Sitemap

Pages

Posts

Preliminaries

Linear Least Squares

Cholesky Solver

QR Solver

Non-linear Least Squares

Gauss-Newton Method

Gradient Descent

Levenberg-Marquardt Method

Example

Problem Statement

Mapping Algorithm

Bayes Inference using Conjugate Prior

Bernoulli Likelihood and Beta Prior

Matrix Lie Groups

Special Orthogonal Group, SO(n)\mathrm{SO}(n)SO(n)﻿

Special Euclidean Group, SE(n)\mathrm{SE}(n)SE(n)﻿

Lie Algebra

Exponential Map

Mathematical Formulation of Bayes Filter

Kalman Filter

Extended Kalman Filter (EKF)

Unscented Kalman Filter (UKF)

Reference

Wave Theory of Sound

Wave Equation

Sinusoidal Plane-wave

Acoustical Two-ports

Continuity of Pressure

Continuity of Volume Velocity

Acoustic Impedance

Reflection and Transmission

Experiment

Heat Equation

Wave Equation

Properties of Power Series

Series Solution

Euler’s Equation

Two Different Real Roots, (α−1)2−4β>0\left(\alpha-1\right)^2-4\beta >0(α−1)2−4β>0﻿

Two Equal Roots, (α−1)2−4β=0\left(\alpha-1\right)^2-4\beta =0(α−1)2−4β=0﻿

Two Complex Solutions, (α−1)2−4β<0\left(\alpha-1\right)^2-4\beta <0(α−1)2−4β<0﻿

Method of Frobenius

Two Complex Roots

Equal Roots

Roots Differing by a Positive Integer

Holomorphic Functions

Cauchy-Riemann Equations

Cauchy’s Theorem

Cauchy’s Integral Formula

Residue Calculus

The Laplace Transform

Properties

Laplace Transform of Common Functions

Inverse Laplace Transform

Reference

Second Order Linear Equations

Algebraic Structure of Linear Equation

Solution to Linear System of Equations

Homogeneous, Constant Coefficients

Inhomogeneous, Constant Coefficients

Variable Coefficients

First Order ODEs

Separable Equation

Linear Equation

Homogeneous

Inhomogeneous

Method of substitution

Equation: y′=f(ax+by+c)y'=f\left(ax+by+c\right)y′=f(ax+by+c)﻿

Homogeneous Equation

Bernoulli’s Equation

Integral Curves

portfolio

research

talks

teaching

Special Orthogonal Group, $\mathrm{SO}(n)$

Special Euclidean Group, $\mathrm{SE}(n)$

Two Different Real Roots, $\left(\alpha-1\right)^2-4\beta >0$

Two Equal Roots, $\left(\alpha-1\right)^2-4\beta =0$

Two Complex Solutions, $\left(\alpha-1\right)^2-4\beta <0$

Equation: $y'=f\left(ax+by+c\right)$