In this article, we present two challenges related to the recursive least square algorithm [1]. We will first explain what the recursive least square algorithm is.

# The Recursive Least Square Algorithm

## 1-Dimensional Case

In the one-dimensional case, let’s say we have a linear relation of real numbers of the form

$y=ax+e$,

where $x$ is a constant real number but unknown and $y$ is known. The real number $e$ is a “small” number that adds up to the value $y$. We would like to find an estimate of $x$ that would make $ax$ as close as possible to $y$; roughly speaking, this would amount to finding an estimate of $x$ that would make $e$ as small as possible. The reason is that $e$ is the difference of $y$ and $ax$ or

$e=y-ax$.

To formalize the notion of closeness, we define a function that we call a **cost function** defined as

$C\left(x,y\right)=\left|y-ax\right|,$

so our problem is to find an estimate of $x$ that minimizes the cost function $C$.

We have not yet told you what the recursive least square algorithm is, as we will need to work in dimensions higher than $1$, but the gist of that algorithm is to find such an estimate of $x$ that minimizes the cost function $C$.

## Higher-dimensional Case

In a higher-dimensional space, the numbers in the first case will be matrices of real numbers, which are arrays of numbers. We will slightly change the notation we have used earlier to emphasize we are working in a more general space, which also covers the $1$-dimensional case.

Let’s say we’re now working in a space of dimension $r$, where $r$ is an integer that is least $1$. The linear relation now becomes a system of $r$ linear equations in variables $x_{1},\ldots,x_{n}$ written as

$\begin{array}{cccccc} y_{1} & = & a_{11}x_{1}+ & \cdots & +a_{1n}x_{n} & +e_{1}\\ \vdots & & \vdots & & \vdots & \vdots\\ y_{r} & = & a_{r1}x_{1}+ & \cdots & +a_{rn}x_{n} & +e_{r} \end{array}$.

Working with these $r$ linear equations can be cumbersome, so we will use the more compact matricial notation to express the above linear relation. Let $\mathbf{y}$ be the vector or the $r$ by $1$ matrix

$\left[\begin{array}{c} y_{1}\\ \vdots\\ y_{r} \end{array}\right],$

$\mathbf{x}$ be the vector or $n$ by $1$ matrix

$\left[\begin{array}{c} x_{1}\\\vdots\\x_{n}\end{array}\right],$

and $\mathbf{e}$ the vector or the $r$ by $1$ matrix

$\left[\begin{array}{c} e_{1}\\ \vdots\\e_{r}\end{array}\right].$

Let $\mathbf{A}$ be the $r$ by $n$ matrix

$\left[\begin{array}{ccc} a_{11} & \cdots & a_{1n}\\ \vdots & \ddots & \vdots\\ a_{r1} & \cdots & a_{rn}\end{array}\right].$

The above linear relation of $r$ equations now becomes this linear matricial relation:

$\mathbf{y}=\mathbf{A}\mathbf{x}+\mathbf{e},$

where addition and multiplication are for matrices.

As we have mentioned a notion of closeness in the $1$-dimensional case, we also need a similar notion in the $r$-dimensional case. For any two elements or vectors

$\mathbf{u}=\left[\begin{array}{c} u_{1}\\ \vdots\\u_{r}\end{array}\right]$

and

$\mathbf{v}=\left[\begin{array}{c} v_{1}\\ \vdots\\ v_{r}\end{array}\right]$

in that $r$-dimensional space, we can define a notion of distance beween these two vectors as

$\left\Vert \mathbf{u}-\mathbf{v}\right\Vert =\sqrt{\left(u_{1}-v_{1}\right)^{2}+\cdots+\left(u_{r}-v_{r}\right)^{2}}.$

Then, we define the cost function $C$ for the distance between $\mathbf{y}$ and $\mathbf{Ax}$ as

$C\left(\mathbf{x},\mathbf{y}\right)=\left\Vert \mathbf{y}-\mathbf{Ax}\right\Vert .$

Our problem in the $r$-dimensional case is to find an estimate of the vector $\mathbf{x}$ that makes $\mathbf{Ax}$ as close as possible to $\mathbf{y}$ or that minimizes the cost function $C$.

## Where’s the recursion?

So far, there’s no need for recursion. Here’s where recursion comes into play. Let’s say matrices $\mathbf{y},\mathbf{A},\mathbf{e}$ are incrementally given, meaning we know their terms up to their first $k$ terms for $\mathbf{y},\mathbf{e}$ and $k\times n$ terms for $\mathbf{A}$ with $k$ less than $r$ and we find estimates of $x_{1},\ldots,x_{n}$ so that the cost function is minimized at stage $k$. When a new term $y_{k+1}$ is given at stage $k+1$, a new term is appended to the vector $\mathbf{e}$ and the matrix $\mathbf{A}$ is augmented by a new row. The recursive least square algorithm uses the estimate of $\mathbf{x}$ from stage $k$ to find an estimate of $\mathbf{x}$ that minimizes the cost function at stage $k+1$.

## Computer Implementation of the Recursive Least Square Algorithm

An algorithm is a method or number of steps to do something or solve a problem in our case. While we can run an algorithm in our head when we need to, we can also program a computer (or electronic brain) to run that algorithm. The recursive least square algorithm has been implemented in Python [2,3,4], and MathWorks implements it for its computing software [5].

# The Challenges

Before we present the challenges to you, we’ll explain the setup.

## 1-Dimensional Case

In the $1$-dimensional case we have mentioned earlier, the relation between $y$ and $x$ is linear. Now, let us consider a quadratic relation between $y$ and $x$ given by

$y=ax^{2}+e,$

where $x$ is a constant but unknown. In general, for any integer $m$ that is at least $1$, let us consider the relation given by

$y=ax^{m}+e.$

The cost function $C$ would now be given by

$C\left(x,y\right)=\left|y-ax^{m}\right|,$

and the goal would be to find an estimate of $x$ that minimizes $C$.

That’s just a warm-up. Let us move to a higher dimension.

## Higher-dimensional Case

To cover more dimensions, say $r$ dimensions, we now have a system of $r$ equations with $n$ unknowns given by

$\begin{array}{cccccc} y_{1} & = & a_{11}x_{1}^{m}+ & \cdots & +a_{1n}x_{n}^{m} & +e_{1}\\\vdots & & \vdots & & \vdots & \vdots\\y_{r} & = & a_{r1}x_{1}^{m}+ & \cdots & +a_{rn}x_{n}^{m} & +e_{r}\end{array}.$

In matricial form, this system of equations is given as

$\left[\begin{array}{c} y_{1}\\\vdots\\y_{r}\end{array}\right]=\left[\begin{array}{ccc} a_{11} & \cdots & a_{1n}\\\vdots & \ddots & \vdots\\ a_{n1} & \cdots & a_{rn} \end{array}\right]\left[\begin{array}{c} x_{1}^{m}\\\vdots\\ x_{n}^{m} \end{array}\right]+\left[\begin{array}{c} e_{1}\\ \vdots\\e_{r}\end{array}\right].$

We will use the notation $\mathbf{x}^{m}$ for the vector $\left[\begin{array}{c} x_{1}^{m}\\\vdots\\ x_{n}^{m}\end{array}\right]$, so we can write the matricial equation in this more compact way:

$\mathbf{y}=\mathbf{A}\mathbf{x}^{m}+\mathbf{e}.$

The cost function $C$ is defined as before, that is,

$C\left(\mathbf{x},\mathbf{y}\right)=\left\Vert \mathbf{y}-\mathbf{A}\mathbf{x}^{m}\right\Vert .$

## Challenge #1

As matrices $\mathbf{y},\mathbf{A},\mathbf{e}$ are incrementally given, as we have explained before, and given the best estimate of $\mathbf{x}$ has been found at stage $k$ and $y_{k+1}$ is given, is there an algorithm that uses the best estimate of $\mathbf{x}$ at stage $k$ to compute an estimate of $\mathbf{x}$ that minimizes the cost function $C$ for each integer $m$? If such an algorithm exists, build a computer program in Python (or any language of your choice) that implements your algorithm for each integer $m$.

## The Distance function

Before we tell you what the next challenge is, we need to again explain to you the setup.

Recall we have defined a notion of a distance between any two vectors $\mathbf{u}$ and $\mathbf{v}$ as $\left\Vert \mathbf{u}-\mathbf{v} \right\Vert$, which we have used to define the cost function $C$.

This distance function on pairs of vectors is called the **Euclidean distance or metric**, and it coincides with the usual notion of a distance between two things in the real world (such as the distance between a bird and someone watching it) and the distance between two points in plane geometry (such as the distance between the two corners that define the hypothenuse of a right triangle).

You can define other distance functions for pairs of vectors. One such a distance function is called the **Minkowski’s distance of order **$p$ given by

$\left(\left|u_{1}-v_{1}\right|^{p}+\cdots+\left|u_{r}-v_{r}\right|^{p}\right)^{\frac{1}{p}},$

where $p$ is a nonzero integer. We will use the notation $d_{p}\left(\mathbf{u},\mathbf{v}\right)$ for the Minkowski distance between vectors $\mathbf{u}$ and $\mathbf{v}$.

When $p$ goes to infinity, you get another distance function called the **Chebyshev distance** or the **chessboard distance** given by

$\max\left(\left|u_{1}-v_{1}\right|,\ldots,\left|u_{r}-v_{r}\right|\right).$

We will use the notation $d_{\infty}\left(\mathbf{u},\mathbf{v}\right)$ for the chessboard distance between vectors $\mathbf{u}$ and $\mathbf{v}$.

The same way we can talk about the minimal value of the cost function $C$ when that function is defined with the Euclidean distance, we can also talk about the minimal value of any function defined with Minkowski distance or the chessboard distance.

## Challenge #2

Let the cost function $C_{p}$ be defined as

$C_{p}\left(\mathbf{x},\mathbf{y}\right)=d_{p}\left(\mathbf{y},\mathbf{A}\mathbf{x}^{m}\right)$

and the cost function $C_{\infty}$ be defined as

$C_{\infty}\left(\mathbf{x},\mathbf{y}\right)=d_{\infty}\left(\mathbf{y},\mathbf{A}\mathbf{x}^{m}\right),$

where $\mathbf{x},\mathbf{y},\mathbf{A},m$ are as defined before.

As matrices $\mathbf{y},\mathbf{A},\mathbf{e}$ are incrementally given, as we’ve explained before, and given the best estimate of $\mathbf{x}$ at stage $k$ and given $y_{k+1}$,

- is there an algorithm that computes at stage $k+1$ an estimate of $\mathbf{x}$ that minimizes the cost function $C_{p}$ for each nonzero integer $m$ and for each nonzero integer $p$? Note that the algorithm will depend on $m$ and $p$. If such a family of algorithms exists, build a computer program in Python (or any language of your choice) that implements your algorithm for each integer $m$ and each integer $p$.
- is there an algorithm that computes at stage $k+1$ an estimate of $\mathbf{x}$ that minimizes the cost function $C_{\infty}$ for each nonzero integer $m$? If such a family of algorithms exists, build a computer program in Python (or any language of your choice) that implements your algorithm for each integer $m$.

# References

[1] *Recursive least squares filter*. Wikipedia: https://en.wikipedia.org/wiki/Recursive\_least\_squares\_filter

[2] Recursive Least Squares (Python). GitHub: https://github.com/adamDhalla/recursive-least-squares/blob/main/recursiveLeastSquares.py

[3] Recursive Least Squares (Python). GitHub: https://github.com/matousc89/padasip/blob/master/padasip/filters/rls.py

[4] Recursive least squares (Python). Statsmodels: https://www.statsmodels.org/v0.10.2/examples/notebooks/generated/recursive\_ls.html

[5] *Recursive Least Squares Estimator*. MathWorks: https://www.mathworks.com/help/ident/ref/recursiveleastsquaresestimator.html$