white houses with shingle roofs in neighborhood - k nearest neighbor algorithm

Playing with Mathematics and Programming – The k-Nearest Neighbor Algorithm – Two Challenges

Share.

In this article, we will give you two challenges related to the $k$-nearest neighbor ($k$-NN) algorithm. Before we do so, we will explain to you what the $k$-nearest neighbor algorithm is.

What is the k-Nearest Neighbor Algorithm?

A simple case

Let’s say there are points in some space, where some of them are colored yellow and others colored green. We need to assign a color (yellow or green) to a new point in that space (see the image below). To do so, we care only about points “nearest” to that uncolored point.

k nearest neighbor algorithm simple example

By “nearest” we mean points that fall within some fixed radius of the uncolored point. We then count the number of green points and the number of yellow points, and we assign to the new point the most popular color, which is green in our case (see the image). The $k$-nearest neighbor algorithm find those points that fall within this radius.

Another way to understand the $k$-nearest neighbor algorithm is that it computes the distance between the uncolored point and all other points and check the color of the $k$ points nearest to the uncolored point. Then it assigns to the new point the most popular color. In our case, $k=4$ (see the image).

A more general case

The $k$-nearest neighbor algorithm usually is used in more general settings, but the idea of the algorithm is the same. These general settings are metric spaces.

A metric space is a collection of points along with a distance function or metric that says “how far” any two points are from each other. The notion of “how far” has been formalized as a function from that space to the nonnegative real numbers and that function satisfies these four properties:

  • the distance between any point and itself is $0$;
  • the reference point for the distance between two points is irrelevant;
  • the distance between any two distinct points is positive;
  • for any three points, the distance between two of those points is at most the sum of the two distances between those two points and the third.

Let us now say what the $k$-nearest neighbor algorithm does in the setting of a metric space. Let the collection $x_{1},\ldots,x_{n}$ be points in some space with a metric $d$. Let $L=\left\{ \ell_{1},\ldots,\ell_{k}\right\} $ be the collection of labels, which corresponds to the colors in thesimple case. Let $F$ be a function that maps the points to the labels, and let $r$ be a fixed positive real number.

For an unlabeled point $z$, the $k$-nearest neighbor algorithm finds all the points in that space that fall within $r$-radius of the point $z$ and assigns to $z$ the most popular label, or the algorithm computes the distance between $z$ and all other points in the space, takes the $k$ nearest points to $z$, and assigns to $z$ the most popular label. In the image below, the label for $z$ is $\ell_{1}$.

k nearest neighbor algorithm metric space

Let me explain to you why these two statements of the algorithm are equivalent. If the $k$-nearest neighbor algorithm finds all points in that space that fall within $r$-radius of the point $z$, then set $k$ to be the number of points that fall within $r$-radius of the point $z$.

If it computes the distance between $z$ and all other points in the space and takes the $k$ nearest points to $z$, then set $r$ to be the maximum distance between $z$ and some of the $k$ points.

Computer implementation of the $k$-nearest neighbor algorithm

The $k$-nearest neighbor algorithm has been implemented for computer use.  It is used as a classification algorithm in machine learning.

For example, the machine learning libray Scikit-learn adds it as a function to its toolkit [1]. The computer vision library OpenCV also implements the $k$-nearest neighbor algorithm [2].

Both of those machine learning frameworks use Python for their implementations.

The $k$-nearest neighbor algorithm has been implemented in the R programming language [3]. The language C++ has also been used to implement the algorithm [4]. Also, MathWorks implements it for its software as a classification algorithm [5].

The Challenges

Before we tell you what the first challenge is, we need to tell you what groups and subgroups are.

Groups and subgroups

Groups are abstraction of the integers $\left\{ \ldots-3,-2,-1,0,1,2,3,\ldots\right\} $ with the operation of addition. With the integers, addition has some algebraic properties:

  • $0$ added to any integer is that integer (existence of an identity);
  • any integer can be added to another integer that results to $0$ (existence of inverses);
  • the order of operation between any three integers is irrelevant (associativity of addition).

Formally, a group is a collection of things (or elements), along with a binary operation that combines any two things into one thing, that has the three above algebraic properties.

There are many examples of groups. Some of them are

  • the group of fixed-dimensional square matrices with addition,
  • the group of nonzero rational numbers with multiplication,
  • the group of strictly increasing continuous functions from the positive real numbers to the positive real numbers with function composition,
  • the group of $n$ nonnegative integers $\left\{ 0,\ldots,n-1\right\} $ with modular addition, and
  • the symmetric group on $n$ elements with function composition.

When a subcollection of a group is also a group under the same binary operation, that subcollection is called a subgroup of the group.

The Symmetric group on a finite collection

The symmetric group is the one we will use in the first challenge, so let us tell you what it is. For a simple case, imagine you have a square whose sides are labeled with $1,2,3,4$ and a pole across from each side; the poles are also labeled with $1,2,3,4$.

labeled square dots symmetric group

The label for each side of the square corresponds to the same label for the pole across from that side, that is, $1$ goes to $1$, $2$ goes to $2$, $3$ goes to $3$, and $4$ goes to $4$. We call this correspondence the identity map since it sends each number to the same number.

Now, rotate that square clockwise by $90^{\circ}$. Then the correspondence between the labels of the sides and the labels of the poles has changed: $4$ goes to $1$, $1$ goes to $2$, $2$ goes to $3$, and $3$ goes to $4$.

labeled rotated square dots symmetric group

If we rotate counterclockwise that square by another $90^{\circ}$, we return to the initial correspondence (identity map). The operation of applying two rotations is a composition and the counterclockwise rotation is the inverse of the clockwise rotation.

If we continue rotating the square in one direction, we will have other correspondences and they can be composed with other correspondences to give the identity map.

The collection of those rotations (or correspondences) with composition is the symmetric group on $4$ elements.

Another way to see the elements of that group is as permutations. The identity map is the string $\left(1234\right)$, meaning $1$ is in the $1$st position, $2$ in the $2$nd position, $3$ in the $3$rd position, and $4$ in the $4$th position.

The $90^{\circ}$-clockwise rotation is the string $\left(4123\right)$.

In general, the symmetric group on $n$ elements is the collection of permutations on $1,\ldots,n$ with composition as the binary operation.

Challenge #1

We have an $n$-dimensional space with the Minkowski metric of order $p$ defined as, for any two points $\mathbf{u}=\left(u_{1},\ldots,u_{n} \right)$ and $\mathbf{v}=\left(v_{1},\ldots,v_{n}\right)$,

$\left(\left|u_{1}-v_{1}\right|^{p}+\cdots+\left|u_{n}-v_{n}\right|^{p}\right)^{\frac{1}{p}},$

where $p$ is some nonzero integer. That space has $k$ points, say $\mathbf{x}_{1},\ldots,\mathbf{x}_{k}$. There are $m$ labels $\ell_{1} \ldots,\ell_{m}$ and a function $F$ that maps the points to the labels. Let $\mathbf{z}$ be an unlabeled point added to that space and $r$ a positive real number. When $r$ equals $1$, let $S_{1,i}$ be the subcollection of the $k$ points within $1$-radius of $\mathbf{z}$ with the label $\ell_{i}$. We denote the number of points in $S_{1,i}$ as $\left|S_{1,i}\right|.$

We make these three assumptions:

  1. For each radius $r$ and each $i=1,\ldots,m$, the subcollection $S_{r,i}$ is nonempty, meaning there are points within $1$-radius of $\mathbf{z}$ with the label $i$;
  2. Each labeled point has exactly one label;
  3. $\left|S_{1,1}\right|<\left|S_{1,2}\right|<\cdots<\left|S_{1,m-1}\right|<\left|S_{1,m}\right|$.

The third assumption gives a rule to construct an element of the symmetric group on $m$ elements: the label $i$ for each $S_{1,i}$ corresponds to the position of the $S_{1,i}$ in that chain of strict inequalities; for example, label $\ell_{1}$ goes to the $1$st position, label $\ell_{2}$ goes to the $2$nd position, and so on. Therefore, from the third assumption, the radius $r=1$ corresponds to the permutation $\left(1,\ldots,m\right)$, which is the identity element of the symmetric group on $m$ elements.

For each radius $r=2,3,4,\ldots$, the number of points in $S_{r,i}$ changes according to this rule:

  • for each $i=1,\ldots,m$, flip a fair coin. If the result is head, $\left|S_{r,i}\right|=\left|S_{r-1,i}\right|+r^{i}$. If the result is tail, $\left|S_{r,i}\right|=\left|S_{r-1,i}\right|$. For each $r=2,3,4,\ldots$, the probability of having a head is $B_{r}=\frac{1}{2^{r-1}}$; additionally, $\frac{m}{2}$ flips are head if $m$ is even and $\frac{m+1}{2}$ flips are heads if $m$ is odd. This sequence of flips is called a Bernoulli process [6].
  • after the flipping for $i=m$, arrange the $\left|S_{r,i}\right|$’s in increasing order. If some of them are equal, arrange them by increasing order in the index $i$ for the labels. For example, if $\left|S_{r,3}\right|=\left|S_{r,7}\right|=\left|S_{r,4}\right|$, arrange them as $\left|S_{r,3}\right|,\left|S_{r,4}\right|,\left|S_{r,7}\right|$. Define an artificial order $\prec$ for the sizes as follows: if there is a subcollection $\left\{ a_{1},\ldots,a_{p}\right\} $ of $\left\{ 1,\ldots,m\right\} $ such that $\left|S_{r,a_{1}}\right|<\cdots<\left|S_{r,a_{p}}\right|$ then $\left|S_{r,a_{1}}\right|\prec\cdots\prec\left|S_{r,a_{p}}\right|$. If there is a subcollection $\left\{ a_{1},\ldots,a_{p}\right\} $ of $\left\{ 1,\ldots,m\right\} $ such that $\left|S_{r,a_{1}}\right|=\cdots=\left|S_{r,a_{p}}\right|$ and $a_{1}<\cdots<a_{p}$ then $\left|S_{r,a_{1}}\right|\prec\cdots\prec\left|S_{r,a_{p}}\right|$.

Therefore, for each radius $r$, there is a unique permutation $\left(a_{1},\ldots,a_{m}\right)$ of $1,\ldots,m$ such that $\left|S_{r,a_{1}}\right|\prec\cdots\prec\left|S_{r,a_{m}}\right|$.

Let $G$ be the function that sends each radius $r$ to an element of the symmetric group on $m$ elements.

Now we are ready for the first challenge:

  1. For each integer $p$ and each nontrivial subgroup of the symmetric group on $m$ elements, meaning a subgroup with more than the identity element, for each element $h$ in that subgroup, is there an algorithm that can find the mininal radius $r$ such that $G\left(r\right)$ equals $h$? If such an algorithm exists, build a Python program (or in any language of your choice) that implements your algorithm. Notethat the integer $p$ is for the Minkowski metric of order $p$.
  2. When $p$ goes to infinity, the sequence of Minkowski metrics yields the chessboard metric given by $\max\left(\left|u_{1}-v_{1}\right|,\ldots,\left|u_{n}-v_{n}\right|\right)$ for any two points $\mathbf{u}$ and $\mathbf{v}$. For each nontrivial subgroup of the symmetric group on $m$ elements, meaning a subgroup with more than the identity element, for each element $h$ in that subgroup, is there an algorithm that can find the mininal radius $r$ such that $G\left(r\right)$ equals $h$? If such an algorithm exists, build a Python program (or in any language of your choice) that implements your algorithm.

Sequences

Before we tell you what the next challenge is, we need to tell you what a sequence is. A sequence in a space is a list of elements (or terms) in that space. That list has a $1$st element, a $2$nd element, a $3$rd element, and so on. If that list has a last element, then we say the sequence is finite;

finite sequence math green dots

if for each term in that list there is a next term, then we say that sequence is infinite.

infinite sequence math green dots

Formally, a finite sequence in a space is a function from the collection $1,\ldots,N$ to that space, for some positive integer $N$, and an infinite sequence in that space is a function from the infinite collection $1,2,3,\ldots$ to that space.

We will refer the positive integers as the positions of the terms in the sequence.

A periodic sequence is an infinite sequence whose terms eventually repeat after a finite number of steps from each other. Formally, an infinite sequence is periodic if there is a positive integer $N$ and a positive integer $P$ such that all terms with positions at least $N$ and that are $P$ steps away from each other are equal.

periodic infinite sequence math green dots

Challenge #2

The artificial order $\prec$ we have defined for the sizes $\left|S_{r,i}\right|$’s does not always guarantee strict inequality among the sizes, but we need a strict inequality among the sizes to decide which label to assign to the unlabeled point $\mathbf{z}$.

To do that we will add a new rule. Let $\left|S_{r,i^{*}}\right|$ be the size such that $\left|S_{r,i^{*}}\right|\prec\left|S_{r,j}\right|$ for all $j$ not equal to $i^{*}$, so $\left|S_{r,i^{*}}\right|$ is the minimal size with the order $\prec$, and let $\left|S_{r,j}\right|\prec\left|S_{r,i^{\#}}\right|$ for all $j$ not equal to $i^{\#}$, so $\left|S_{r,i^{\#}}\right|$ is the maximal size with the order $\prec$.

Note that the minimal and maximal sizes always exist for each $r$ since there is a unique permutation $\left(a_{1},\ldots,a_{m}\right)$ of $1,\ldots,m$ such that $\left|S_{r,a_{1}}\right|\prec\cdots\prec\left|S_{r,a_{m}}\right|$ for each $r$. Now, let $q_{r}=\left|S_{r,i^{\#}}\right|-\left|S_{r,i^{*}}\right|$. For each $i=1,\ldots,m$, we add $q_{r}+\left(i-1\right)$ points with label $\ell_{a_{i}}$ to the space within $r$-radius to the unlabeled point $\mathbf{z}$. Let us call $S’_{r,a_{i}}$ this new collection of points with label $a_{i}$ from adding these new points, so $\left|S’_{r,a_{i}}\right|=\left|S_{r,a_{i}}\right|+\left[q_{r}+\left(i-1\right)\right]$ and $\left|S’_{r,a_{1}}\right|<\cdots<\left|S’_{r,a_{m}}\right|$.

We are now ready to tell you what the second challenge is. For each $r=1,2,3,\ldots$, assign to the unlabeled point $\mathbf{z}$ the label of the points with the maximal size among the sizes $\left|S’_{r,a_{i}}\right|$’s with the usual strict order. This assignment of radii to labels form a sequence of labels, with the radii being the positions. Is this sequence of labels periodic? If it is, build a Python program (or in any language of your choice) that computes the positive integers $N$ and $P$ (as explained above), and use your program to output the first million terms with positions equal to at least $N$.

References

[1] sklearn.neighbors.KNeighborsClassifier. Scikit-learn: https://scikit-learn.org/stable/modules/generated sklearn.neighbors.KNeighborsClassifier.html

[2] cv::ml::KNearest Class Reference. OpenCV: https://docs.opencv.org/3.4/dd/de1/classcv\_1\_1ml\_1\_1KNearest.html

[3] Knn R, K-nearest neighbor classifier implementation in R programming from scratch. Dataaspirant: https://dataaspirant.com/k-nearest-neighbor-classifier-implementation-r-scratch/

[4] k-Nearest Neighbors. GitHub: https://github.com/mddragnev/k-nearest-neighbors/blob/master/KNN/KNN/main.cpp

[5] ClassificationKNN. MathWorks: https://www.mathworks.com/help/stats/classificationknn.html

[6] Bernoulli process. Wikipedia: https://en.wikipedia.org/wiki/Bernoulli\_process