Originating author is Christiane Rousseau.
From its very beginning, Google became “the” search engine. This comes from the supremacy of its ranking algorithm: the PageRank algorithm. Indeed, with the enormous quantity of pages on the World-Wide-Web, many searches end up with thousands or millions of results. If these are not properly ordered, then the search may not be of any help, since no one can explore millions of entries.
How does the PageRank algorithm work?
http://www.mathunion.org/icmi/other-activities/klein-project/introduction/
rather than
http://www.kleinproject.org/
will appear first in a search for Klein project.
To explain the algorithm, we model the web as an oriented graph. The vertices are the pages, and the oriented edges are the links between pages. As we just explained, each page corresponds to a different url. Hence, a website may contain many pages. The model makes no difference between the individual pages of a website and its front page. But, most likely, the algorithm will rank better the front page of an important website.
A simple example
Let us look at the simple web on the left with five pages named
,
,
,
, and
. This web has few links. If we are on page
, then there is only one link to page
, while, if we are on page
, we find three links, and we can choose to move to either page
, or
, or
. Notice that there is at least one link from each page.
We play a game, which is simply a random walk on the oriented graph. Starting from a page, at each step we choose at random a link from the page where we are, and we follow it. For instance, in our example, if we start on page , then we can go to
or to
with probability
for each case while, if we start on
, then we necessarily go to
with probability
. We iterate the game.


Let us remark that the sum of entries of any column of is equal to
and that all entries are greater or equal to zero. Matrices having these two properties are very special: each such matrix is the matrix of a Markov chain process, also called Markov transition matrix. It always has
as an eigenvalue and there exists an eigenvector with eigenvalue
, all the components of which are less than or equal to
and greater than or equal to
, with sum equal to
. But, before recalling the definitions of eigenvalue and eigenvector, let us explore the advantage of the matrix representation of the web graph.
We consider a random variable with values in the set of pages
, which contains
pages (here
).
represents the page where we are after
steps of the random walk. Hence, if we call
the entry of the matrix
located on the
-th row and
-th column, then
is the conditional probability that we are on the
-th page at step
, given that we were on the
-th page at step
:
Note that this probability is independent of ! We say that a Markov chain process has no memory of the past. It is not difficult to figure that the probabilities after two steps can be summarized in the matrix
.
Let us prove it (you can skip the proof if you prefer). The theorem of total probabilities yields
By definition of conditional probability
We use a familiar trick: multiply and divide by the same quantity:
The first quotient is equal to
since a Markov chain process has no memory past the previous
step. Hence,
In our example
Iterating this idea, it is clear that the entry , of the matrix
describes the probability
. For instance,
All columns of are identical if we choose precision to 3 decimals, and the same as the columns of
when
. If we choose a higher precision, we also find stabilization, but for
larger than
. Hence, after
steps, where
is sufficiently large, the probability of being on a page is independent from where we started!
Moreover, let us consider the vector
( is a vertical vector, and its transpose
is a horizontal vector). It is easily checked that
. If we consider the
-th coordinate of the vector
as the probability of being on page
at a given time
, and hence
as the probability distribution of pages at time
, then it is also the probability distribution at time
. For this reason, the vector
is called the stationary distribution. This stationary distribution allows to order the pages. In our example, we order the pages as
,
,
,
,
, and we declare
the most important page.
The general case
The general case can be treated exactly as our example. We represent the web as an oriented graph in which the vertices represent the
pages of the web, and the oriented edges represent the links between pages. We summarize the graph in an
matrix,
, with the
-column representing the
-th departing page and the
-row, the
th arrival page. In our example we found a vector
satisfying
. This vector is an eigenvector of the eigenvalue
. Let us recall the definition of eigenvalue and eigenvector.
Definition. Let be an
matrix. A number
is an eigenvalue of
if there exists a nonzero vector
such that
. Any such vector
is called an eigenvector of
.
We also recall the method to find the eigenvalues and eigenvectors.
Proposition Let be an
matrix. The eigenvalues of
are the roots of the characteristic polynomial
, where
is the
identity matrix The eigenvectors of an eigenvalue
are the nonzero solutions of the linear homogeneous system
.
The following deep theorem of Frobenius guarantees that for the matrix associated to a web graph, we will always find a stationary solution.
Theorem (Frobenius theorem) We consider an Markov transition matrix
(i.e.
for all
, and the sum of entries on each column is equal to
, namely
). Then
is one eigenvalue of
.
- Any eigenvalue
of
satisfies
.
- There exists an eigenvector
of the eigenvalue
, all the coordinates of which are greater
or equal than zero. Without loss of generality we can suppose that
the sum of its coordinates be.
It is now time to admire the power of this theorem. For this purpose, we will make the simplifying hypothesis that the matrix has a basis of eigenvectors
and we suppose that
is the vector
of Frobenius theorem. For each
, there exists
such that
. Let us take any nonzero vector
with
, where
and
. We decompose
in the basis
:
A technical proof that we skip allows to prove that . Now, let us calculate
:
since is an eigenvector of the eigenvalue
. And if we iterate we get
If all for
satisfy
, then
This is exactly what happens in our example!
Notice however that the theorem does not guarantee that any matrix satisfying the hypotheses of the theorem will have this property. Let us describe the possible pathologies and the remedy.
Possible pathologies.
- The eigenvalue
may be a multiple root of the characteristic polynomial
.
- The matrix
may have other eigenvalues
than
, with modulus equal to
.
What do we do in this case?
We call a Markov transition matrix with no pathology a regular Markov transition matrix, namely
Definition. A Markov transition matrix is regular if
- The eigenvalue
is a simple root of the characteristic polynomial
.
- All other eigenvalues
of
than
have modulus less than
.Remark that most matrices
are regular. So, if the Markov transtion matrix of a web is not regular, then the strategy is to deform it slightly to a regular one.
Remedy. We consider the matrix
, with
for all
. We replace the matrix
of the web by the matrix
(1)
for some small . (The value
has been used by Google.) Notice that the matrix
still has nonnegative entries and that the sum of the entries of each column is still equal to
, so it is still a Markov transition matrix. The following theorem guarantees that there exists some small
for which we have fixed the pathologies.
Theorem Given any Markov transition matrix , there always exists a positive
, as small as we wish, such that the matrix
is regular. Let
be the eigenvector of
for the matrix
, normalized so that the sum of its coordinates be
. For the matrix
, given any nonzero vector
, where
with
and
, then
(2)
Link with the Banach fixed point theorem
Another (coming) vignette treats the Banach fixed point theorem. The theorem above can be seen as a particular application of it. You can skip this section if you have not read the other vignette. Indeed,
Theorem Let be a regular Markov transition matrix. We consider
, with an adequate distance
between points (this distance depends on
). On
, we consider the linear operator
defined by
. The operator
is a contraction on
, namely there exists
such that for all
,
Then, there exists a unique vector such that
.
Moreover, given any , we can define the sequence
by induction, where
. Then,
.
Definition of the distance . The definition of the distance
is a bit subtle and can be skipped. We include it for the sake of completeness for the reader who wants to push far the details. We limit ourselves to the case where the matrix
is diagonalizable. Let
be a basis of eigenvectors. Vectors
can be written in the basis
:
where . Then we define
With this distance, is a complete metric space, i.e. all Cauchy sequences converge.
This theorem not only guarantees the existence of , but gives a method to construct it, as the limit of the sequence
. We have seen an illustration of this convergence in our example. Indeed, the
-th column of
is the vector
, where
is the
-th vector of the canonical basis. Of course, in our example, we could also have found directly the vector
by solving the system
with matrix
We would have found that all solutions are of the form for
. The solution whose sum of coordinates is
is hence
, where
Practical calculation of the stationary distribution
We have identified the simple idea underlying the algorithm. However finding the stationary distribution , i.e. an eigenvector of the eigenvalue
for the matrix
in (1), is not at all a trivial task when the matrix has billions of lines and columns: both the time of the computation and the memory space to achieve it represent real challenges. The usual method of Gauss elimination is useless in this case, both because of the size of the computation, and because it requires to divide by small coefficients. A more efficient algorithm makes use of the property (2) (see [LM]). This makes the link with the coming vignette on the Banach fixed point theorem which will enlighten that the proof of Banach fixed point theorem provides an algorithm to construct the fixed point.
Indeed, we start with such that
and we need to calculate . Usually,
for some
between
and
gives a pretty good approximation of
. By induction, we calculate
. Even such calculations are quite long. Indeed, due to its construction, the matrix
in (1) has no zero entries. On the other hand, most of the entries of the matrix
are zero. So, we must decompose the computation to take advantage of this fact, namely
Because of the special form of , it is easy to verify that if
is a vector the sum of entries of which is
, then
. Hence, it suffices to calculate the sequence
Conclusion
We have presented the public part of Google’s PageRank algorithm. Already you can experiment with simple webs and find the tricks to improve the ranking of your personal page by adding internal and external links in an optimal way. Some private, more sophisticated parts continue to be developed. Some of them consist in replacing the ” neutral” matrix of (1) by matrices reflecting the taste of the surfer on the web. Others ensure that the ranking is not too sensitive to the manipulations of those who try to improve the ranking of their pages.
As general conclusion, what have we observed? A simple, clever idea has led to a immense breakthrough in the efficiency of search engines, and to the birth of a commercial empire. Even if the implementation is in itself a computational exploit, the initial idea required ” elementary” mathematics, namely linear algebra and probability theory. These relatively standard mathematical tools, in particular the diagonalization of matrices, have given their full power when they have been used outside of their “normal” context. Also, we have highlighted the unifying ideas inside science, with Banach fixed point theorem having applications so remote from its origin.
Bibliography
[E] M. Eisermann, Comment Google classe les pages webb, http://images.math.cnrs.fr/Comment-Google-classe-les-pages.html, 2009.
[LM] A. N. Langville and C. D. Meyer, A Survey of Eigenvector Methods
for Web Information Retrieval, SIAM Review, Volume 47, Issue 1,
(2005), pp. 135–161.
[RS] C. Rousseau and Y. Saint-Aubin, Mathematics and
technology, SUMAT Series, Springer-Verlag, 2008 (A French version of the book exists, published in the same series.)
Thanks for this piece, but I am still lacking the connection between google’s algorithm and the analysis above. Does the stationary distribution define the web-page’s ranking in a search?
How markov chain apply in pagerank ?
The next challenge is to expose high school students to this and other articles on Klein project blog website. A study dealing with such a challenge is on going in Israel in the past 2 years based upon a preliminary presentation at ESU5 (Movshovitz-Hadar 2007), and an action research presented at ESU6 (Amit and Movshovitz-Hadar 2011). We develop short, single-topic PowerPoint presentations named Math-News-Snapshots to be interwoven in the curriculum once a fortnight so that progress in the mandatory curriculum is not harmed yet students get the sense of what’s happening in contemporary mathematics. We will be glad to share more information about the study. Please send personal e-mail to: nitsa at technion dot ac dot il
After I initially commented I seem to have clicked the -Notify me when new comments are added- checkbox annd now ewch
time a comment is added I recieve 4 emails with the same comment.
Is there a waay you aare able to remove me from that service?
Appreciate it!
Your mode of describing all in this artile is actually good, all can without difficulty understand it, Thanks a lot.