Banach’s microscope to find a fixed point

Originating author is Christiane Rousseau.
In this vignette, we will show how we start from a small game to discover one of the most powerful theorems of mathematics, namely the Banach fixed point theorem. This theorem has fantastic applications inside and outside mathematics. In Section 3 we will discuss the fascinating application to image compression.

But, let us start with our game and look at the famous lid of a box of The Laughing Cow.

The right earring of the Cow is again a Laughing Cow. To each point of the lid, we associate the corresponding point on the right earring. This is of course a function from the lid to itself, which we will call $F$. For instance, to the tip of the chin of the cow we associate the tip of the chin of the small cow on the right earring. To the center of the right eye of the cow we associate the center of the right eye of the small cow on the right earring, etc. Now here is the quiz: is there is a point that is sent to itself by this process? Such a point, if it exists, will be called a fixed point. If a fixed point exists, then it is not any of the points we have listed above. Also, if a fixed point exists, then it should lie in the right earring. But this right earring is sent to the right earring of the small cow, etc. Visually, we see that these nested right earrings seem to converge to a point, which we call $A$, and $A$ is a candidate for our solution.

Now, let us start with any point, for instance the tip of the chin and call it $P_0$. Then $P_0$ is sent to $P_1=F(P_0)$, which is the tip of the chin of the small cow on the right earring. Then $P_1$ is sent to $P_2=F(P_1)$, which is the tip of the chin of the cow appearing on the right earring of the small cow. Etc. We remark three things:

We could continue the process an infinite number of times and generate a sequence $\{P_n\}$, where $P_{n+1}=F(P_n)$.
Visually, only a finite number of points of the sequence look distinct, and all the others cannot be distinguished. Of course, we can zoom and we will see more points. But, whatever a zoom we choose, we will still distinguish only a finite number of points and the others cannot be distinguished.
This sequence seems to converge to the same point $A$ as before.

Had we taken another point $Q_0$ and constructed the sequence $\{Q_n\}$, where $Q_{n+1}=F(Q_n)$, we would have seen that the sequence $\{Q_n\}$ would have seemed to converge to the same point $A$. In fact, we can remark that this follows from the fact that the singleton $\{A\}$ is the intersection of the nested right earrings whose diameter tends to $0$.

What will Banach fixed point theorem tell us? It will tell us that indeed the function $F$ has a unique fixed point, i.e. there exists a unique point $A$ of the plane such that $F(A)=A$. Moreover, it will also assert that if we take any point $P_0$ and we construct the sequence $\{P_n\}$, where $P_{n+1}=F(P_n)$, then the sequence $\{P_n\}$ will converge to $A$.

But why? Would have that occurred for any function $F$? Of course not. For instance, a translation on the plane has no fixed point. And the map $G(x,y)=(x+(x^2-1),y)$ has the two fixed points $(\pm 1,0)$. The map $F$ of our game has a special property. It is a contraction. Indeed, the image is much smaller than the domain. If two points $P$ and $Q$ are some distance apart, then their images $F(P)$ and $F(Q)$ are closer one to the other than $P$ and $Q$ are. This statement makes sense because in ${\mathbb R}^2$ we can measure the distance between two elements. This is because ${\mathbb R}^2$ is a metric space. It is the property that $F$ is a contraction which guarantees that, when we construct the sequence $\{P_n\}$, then, whatever the threshold of precision we take, (whether we look at the sequence from far, or close to it, or through a microscope, or through an electronic microscope), then after some $N$ which depends on our threshold, all elements $P_n$, $n>N$, of the sequence become indistinguishable. Two elements are distinguishable if the distance between them is above a certain threshold. We will recall in the next section that a sequence having this property is called a Cauchy sequence. In ${\mathbb R}^2$, we have the property that every Cauchy sequence has a limit. We say that ${\mathbb R}^2$ is a complete metric space.

1. The Banach fixed point theorem

We now have all the ingredients for the general case and we can state the theorem.

Theorem (Banach fixed point theorem). Let $\mathcal{K}$ be a complete metric space in which the distance between two points $P$ and $Q$ is denoted $d(P,Q)$. And let $F:\mathcal{K}\rightarrow \mathcal{K}$ be a contraction, i.e. there exists $c\in(0,1)$ such that for all $P,Q\in \mathcal{K}$, then $$d(F(P),F(Q))\leq c \;d(P,Q).$$ Then $F$ has a unique fixed point, i.e. there exists a unique $A\in \mathcal{K}$ such that $F(A)=A$.

We will define all terms appearing in this statement. This part is more formal and can be skipped if you prefer to concentrate on the fascinating applications.
We know what is the distance of two points $P$ and $Q$ in ${\mathbb R}^2$. How do we generalize to a set $\mathcal{K}$?

Definition. A distance on a set $\mathcal{K}$ is a function $d:\mathcal{K}\times \mathcal{K}\rightarrow {\mathbb R}$ satisfying

For all $P,Q\in \mathcal{K}$, $d(P,Q)\geq0$;
$d(P,Q)=0$ if and only if $P=Q$;
For all $P,Q,R\in \mathcal{K}$, $d(P,Q)\leq d(P,R)+d(R,Q);$ (triangular inequality.)

We know that these properties are satisfied for the usual Euclidian distance in ${\mathbb R}^2$.

We now recall the definition of Cauchy sequence, which is the formalization of the concept of a sequence for which, whatever threshold of precision we take, after a finite number of elements, all other elements become indistinguishable. We also recall the definition of convergent sequence.

Definition.

A sequence $\{P_n\}$ of elements of a metric space $\mathcal{K}$ is a Cauchy sequence if for all $\epsilon>0$, there exists $N\in {\mathbb N}$ such that for all $n,m>N$, then $$d(P_n,P_m)<\epsilon.$$
A sequence $\{P_n\}$ of elements of a metric space $\mathcal{K}$ converges to a limit $A\in \mathcal{K}$ if for all $\epsilon>0$, there exists $N\in {\mathbb N}$ such that for all $n>N$, then $$d(P_n,A)<\epsilon.$$

Definition. A metric space $\mathcal{K}$ is a complete metric space if any Cauchy sequence $\{P_n\}$ of elements of $\mathcal{K}$ converges to an element $A$ of $\mathcal{K}$.

How do we prove Banach fixed point theorem? The uniqueness of the fixed point is easy. Indeed, suppose that $A$ and $B$ are two fixed points. Then, $F(A)=A$ and $F(B)=B$. Moreover, since $F$ is a contraction, then $$d(F(A),F(B))\leq c\;d(A,B),$$ and hence, $d(A,B)\leq c\;d(A,B)$. The only solution is $d(A,B)=0$, which yields $A=B$.

As for the existence, the idea of the proof is also simple: we had already found it in our game with the Laughing Cow! We take any point $P_0\in \mathcal{K}$ and we construct (as before) the sequence $\{P_n\}$, where $P_{n+1}=F(P_n)$. Then, this sequence is a Cauchy sequence and its limit is a fixed point. (Of course, showing these two assertions require some work, but we will skip the technical details. What is important is that the proof is the same in the general case of a complicated complete metric space as in the simple case $\mathcal{K}={\mathbb R}$.)

The idea of the proof is not only simple and intuitive, but also very powerful. It provides us with a way of numerically constructing the fixed point $A$. This explains that the many applications of this theorem can be found both on the theoretical side and on the applied side.

2. The applications in analysis

One of the very important theoretical application of Banach fixed point theorem is the proof of existence and uniqueness of solutions of differential equations sufficiently regular. In this application, the complete metric space $\mathcal{K}$ is a set of functions, and the map $F$ transforms a function into another function (we often say that $F$ is an operator). The trick is to show that a solution of the differential equation, if its exists, is a fixed point of the operator $F$.

You may have studied simple differential equations and have learnt tricks to find formulas for the solutions. Well, these differential equations are the exception and, for most differential equations, no formula exists for the solution. Hence, the importance of a theorem asserting the existence of a solution. You should not be surprised that no formula can be given for the solutions of most differential equations. Indeed, consider the simple differential equation $$y’= e^{-x^2}.$$ Its solution is given by $$y=\int e^{-x^2}dx.$$ You may remember being told in your probability or statistics course that there exists no formula for the primitive of the function $e^{-x^2}$, hence explaining why we need to work with tables when we study the normal law.

3. Application to image compression

The best way to store an image in memory is to store the color of each pixel. There are two problems with this method:

It requires an enormous quantity of memory.
If we try to enlarge the image, for instance for using it in a large poster, then the pixels will become larger squares and we will be missing information on how to fill the details in these squares.

What is the principle of image compression? It is to encode less information than the original image, but to do it in a clever way, so that the eye cannot see that the image observed is deteriorated. Internet has increased the need for good image compression. Indeed, images slow down navigation on the web quite significantly. So, for internet navigation, it is good to have images encoded in files as small as possible. When you look at the image on your computer screen you cannot see that it has been deteriorated. But, if you try to enlarge it or to use it for a poster, then you immediately see that the quality is poor.

There are several principles of image compression and one of the most familiar is JPEG, which has become the standard for digital photos. The encoding of an image in JPEG format is also a mathematical algorithm.

In this vignette, we will concentrate on another method, which has remained more experimental. This method, introduced by Barnsley, has been called iterated function systems. The underlying idea of the method is to approximate an image by geometric objects. To have sufficiently many objects available, we will not limit ourselves to the usual geometric objects which are lines and smooth curves, but allow complicated fractal objects like the fern and the Sierpiński carpet (see figures on the left).

We will explain the idea of the compression process on the Sierpiński carpet on the left. It looks a priori a complicated object. How to store it in computer memory, in an economical way? The best is to store a program to reconstruct it when needed.

And to build this program, we need to understand what characterizes this geometric object. Let us observe our Sierpiński carpet: it is the union of three Sierpiński carpets (i.e. 3 copies of itself), that are half its size (in width and height). Indeed, starting from the Sierpiński carpet, we can construct a second one with the following procedure:

We shrink the Sierpiński carpet to its half size from the lower left vertex.
We make a second copy of this half Sierpiński carpet and glue it on the right.
We make a third copy of this half Sierpiński carpet and glue it on the top.

The second figure we have built is identical to our initial Sierpiński carpet. So the Sierpiński carpet is the fixed point for this process.

Let us now put that in mathematical terms. Note that the length of the base of the Sierpiński carpet is equal to its height. Hence, we can take axes with the origin at the lower left corner of the Sierpiński carpet, and units so that the base and height have both length 1. We also take the following affine transformations defined on ${\mathbb R}^2$:
\begin{align*}
T_1(x,y)&=\left(\frac{x}2,\frac{y}2\right),\\
T_2(x,y)&=\left(\frac{x}2+\frac12,\frac{y}2\right),\\
T_3(x,y)&=\left(\frac{x}2+\frac14,\frac{y}2+\frac12\right).\end{align*}
If $S$ is the Sierpiński carpet, then we have
$$S=T_1(S)\cup T_2(S)\cup T_3(S).$$
Are there other subsets $B$ of the plane which have the same property, namely \begin{equation}B=T_1(B)\cup T_2(B)\cup T_3(B)?\label{ifs}\end{equation} We will experiment and see that there are none! So we have characterized our Sierpiński carpet as the only subset $B$ of the plane such that (??) is satisfied. What have we done? We have constructed a function which, to a subset $B$ of the plane associates the subset $T_1(B)\cup T_2(B)\cup T_3(B)$. Let us call this function $W$. It is defined by \begin{equation}B\mapsto W(B)=T_1(B)\cup T_2(B)\cup T_3(B),\label{def_W}\end{equation} and we have observed that $S=W(S)$, i.e. $S$ is a fixed point of this function.

We have announced that we would experiment that the Sierpiński carpet is the only fixed point of this function. Let us try with a square $C_0$ as in Figure 1(a). Its image is $C_1$ in Figure 1(b). We apply the same process to $C_1$ and get $C_2$, $C_3$-$C_5$. (Figure 1 (c)-(f)).

We observe three things:

None of the sets $C_0$, …, $C_5$ is a fixed point of $W$.
We could have continued the process indefinitely, thus producing an infinite sequence of sets $\{C_n\}$, where $C_{n+1}=W(C_n)$.
The sequence $\{C_n\}$ seems to converge rapidly to the Sierpiński carpet. Indeed, our eye cannot distinguish $C_{10}$ from $S$. So, instead of $S$ on our image, the program reconstructing our image can just produce $C_{10}$. And, if a better resolution were needed, then we would use the same program and ask it to stop at $C_{20}$, or $C_{30}$. So the same very small program can reconstruct $S$ to any precision!

Not only that, but we can experiment that it works with any initial set! A second example iterating a pentagon appears in Figure 2. The same remarks (i), (ii) and (iii) as above apply to this example.

We have seen that the Banach fixed point theorem applies to contractions on complete metric spaces. We have defined the function $W$ in (??) on subsets of the place. For the metric space, we will take $\mathcal{K}$ as the set of (closed) bounded subsets of the place. We will introduce a distance on $\mathcal{K}$ called the Hausdorff distance. The definition of the Hausdorff distance $d_H(B_1,B_2)$, between two subsets $B_1$ and $B_2$ is a complicated and obscure formula, so we we explain the notion in another way. We start by explaining what meaning we want to give to the statement $$d_H(B_1,B_2)\leq \epsilon$$ (even if we have not defined what $d_H(B_1,B_2)$ is!) It would just mean that if our eye has a precision of $\epsilon$, then it cannot distinguish $B_1$ from $B_2$. In mathematical terms, then $d_H(B_1,B_2) \leq \epsilon $ should mean \begin{equation}\forall P\in B_1\;\exists Q\in B_2 \: d(P,Q)\leq \epsilon \quad\text{and}\quad \forall P’\in B_2\;\exists Q’\in B_1 \: d(P’,Q’)\leq \epsilon. \label{def:d_H}\end{equation} (Here $d$ is the usual Euclidean distance on $\mathbb R^2$.) This allows to give the indirect definition.

Definition. The Hausdorff distance between two closed bounded sets $B_1$ and $B_2$ is the minimum of all $\epsilon\geq0$ such that (??) is satisfied.

We can now convince ourselves that the function $W$ is a contraction. Indeed, suppose that $d_H(B_1,B_2)\leq \epsilon$. Then we can show that $d_H(W(B_1),W(B_2))\leq \frac{\epsilon}{2}$. Let $P\in W(B_1)$. Then there exist $i\in \{1,2,3\}$ and $P_1\in B_1$ such that $P=T_i(P_1)$. Since $d_H(B_1,B_2)\leq \epsilon$, there exists $Q_1\in B_2$ such that $d(P_1,Q_1)<\epsilon$. Let $Q=T_i(Q_1)$. Then $Q\in W(B_2)$ and $d(P,Q)= d(T_i(P_1),T_i(Q_1))=\frac12 d(P_1,Q_1)\leq\frac{\epsilon}{2}$. Similarly, if we start with $P'\in W(B_2)$ then there exists $Q'\in W(B_1)$ such that $d(P',Q')\leq\frac{\epsilon}{2}$. This process has been adapted to the compression of real images (see [K] or [RS]). The method produces high quality images when the image has some fractal character. However the rate of compression is not as good and not as flexible as the JPEG format. Also, the encoding process (transforming the image in a program to reconstruct it) is still too tedious to be of practical interest. However, the simplicity of its idea together with its power remains quite striking. 4. A surprising application: the PageRank algorithm

Google’s success as a search engine comes from its algorithm: the PageRank algorithm. In this algorithm, one computes a fixed point of a linear operator on ${\mathbb R}^n$ which is a contraction, and this fixed point (which is a vector) yields the ordering of the pages. In practice, the fixed point (which is an eigenvector of the eigenvalue $1$) is calculated approximately as $P_n$ for $n$ sufficiently large. We invite the interested reader to look at the details in the vignette on How Google works?.

5. Conclusion

What we have experienced inside that vignette is how, starting from a simple game, we can discover very powerful ideas that may lead to mathematical and technological breakthroughs. When we are looking for a unique solution of a problem, it has now become a standard method in many domains of mathematics to try to see if the solution of the problem can be characterized as the unique fixed point of an operator especially constructed for that purpose.

We have seen that the advantage of this approach is that the theorem provides an efficient practical method to construct the solution object as the limit of a sequence since the convergence is fast.

Analysis is the study of functions. Functions are usually defined on numbers. In multivariate calculus, we generalize the notion of functions to vectors which are elements of ${\mathbb R}^n$. But, why stop with elements of ${\mathbb R}^n$? We have also experienced that mathematicians like to generalize the notion of functions, and allow them to be defined, for instance, on sets of sets, and on sets of functions, etc. Doing analysis on sets of functions has now become an important chapter of modern analysis, called functional analysis, which is standard in graduate studies.

You are invited to make the link with some of the iterative processes you have encountered. For instance, it could have been the iterative processes in dimension 1 associated to the H\’eron sequences to obtain square roots. The rapid geometric convergence can also be understood from the point of view presented here.

Bibliography

[B] M. F. Barnsley, Fractals everywhere, San Diego, Academic Press, 1988.

[K] J. Kominek, Advances in fractal compression for
multimedia applications, Multimedia Systems Journal, vol. 5,
n$^{o}$ 4, 1997, 255–270.

[R] C. Rousseau, Point fixe de Banach (in French), Accromath 5, hiver-printemps 2010 (www.accromath.ca).

[R2] C. Rousseau, How Google works? Klein vignette (www.kleinproject.org).

[RS] C. Rousseau and Y. Saint-Aubin, Mathematics and
technology, SUMAT Series, Springer-Verlag, 2008 (A French version of the book, Mathéematiques et technologie, exists, published in the same series.)

Banach’s microscope to find a fixed point

Related

Leave a ReplyCancel Reply

Share this:

Related

Leave a ReplyCancel Reply