Originating author is Christiane Rousseau.
From its very beginning, Google became “the” search engine. This comes from the supremacy of its ranking algorithm: the PageRank algorithm. Indeed, with the enormous quantity of pages on the World-Wide-Web, many searches end up with thousands or millions of results. If these are not properly ordered, then the search may not be of any help, since no one can explore millions of entries.
How does the PageRank algorithm work?
http://www.mathunion.org/icmi/other-activities/klein-project/introduction/
rather than
http://www.kleinproject.org/
will appear first in a search for Klein project.
To explain the algorithm, we model the web as an oriented graph. The vertices are the pages, and the oriented edges are the links between pages. As we just explained, each page corresponds to a different url. Hence, a website may contain many pages. The model makes no difference between the individual pages of a website and its front page. But, most likely, the algorithm will rank better the front page of an important website.
A simple example
Let us look at the simple web on the left with five pages named , , , , and . This web has few links. If we are on page , then there is only one link to page , while, if we are on page , we find three links, and we can choose to move to either page , or , or . Notice that there is at least one link from each page.
We play a game, which is simply a random walk on the oriented graph. Starting from a page, at each step we choose at random a link from the page where we are, and we follow it. For instance, in our example, if we start on page , then we can go to or to with probability for each case while, if we start on , then we necessarily go to with probability . We iterate the game.
Let us remark that the sum of entries of any column of is equal to and that all entries are greater or equal to zero. Matrices having these two properties are very special: each such matrix is the matrix of a Markov chain process, also called Markov transition matrix. It always has as an eigenvalue and there exists an eigenvector with eigenvalue , all the components of which are less than or equal to and greater than or equal to , with sum equal to . But, before recalling the definitions of eigenvalue and eigenvector, let us explore the advantage of the matrix representation of the web graph.
We consider a random variable with values in the set of pages , which contains pages (here ). represents the page where we are after steps of the random walk. Hence, if we call the entry of the matrix located on the -th row and -th column, then is the conditional probability that we are on the -th page at step , given that we were on the -th page at step :
Note that this probability is independent of ! We say that a Markov chain process has no memory of the past. It is not difficult to figure that the probabilities after two steps can be summarized in the matrix .
Let us prove it (you can skip the proof if you prefer). The theorem of total probabilities yields
By definition of conditional probability
We use a familiar trick: multiply and divide by the same quantity:
The first quotient is equal to
since a Markov chain process has no memory past the previous
step. Hence,
In our example
Iterating this idea, it is clear that the entry , of the matrix describes the probability . For instance,
All columns of are identical if we choose precision to 3 decimals, and the same as the columns of when . If we choose a higher precision, we also find stabilization, but for larger than . Hence, after steps, where is sufficiently large, the probability of being on a page is independent from where we started!
Moreover, let us consider the vector
( is a vertical vector, and its transpose is a horizontal vector). It is easily checked that . If we consider the -th coordinate of the vector as the probability of being on page at a given time , and hence as the probability distribution of pages at time , then it is also the probability distribution at time . For this reason, the vector is called the stationary distribution. This stationary distribution allows to order the pages. In our example, we order the pages as , , , , , and we declare the most important page.
The general case
The general case can be treated exactly as our example. We represent the web as an oriented graph in which the vertices represent the pages of the web, and the oriented edges represent the links between pages. We summarize the graph in an matrix, , with the -column representing the -th departing page and the -row, the th arrival page. In our example we found a vector satisfying . This vector is an eigenvector of the eigenvalue . Let us recall the definition of eigenvalue and eigenvector.
Definition. Let be an matrix. A number is an eigenvalue of if there exists a nonzero vector such that . Any such vector is called an eigenvector of .
We also recall the method to find the eigenvalues and eigenvectors.
Proposition Let be an matrix. The eigenvalues of are the roots of the characteristic polynomial , where is the identity matrix The eigenvectors of an eigenvalue are the nonzero solutions of the linear homogeneous system .
The following deep theorem of Frobenius guarantees that for the matrix associated to a web graph, we will always find a stationary solution.
Theorem (Frobenius theorem) We consider an Markov transition matrix (i.e. for all , and the sum of entries on each column is equal to , namely ). Then
- is one eigenvalue of .
- Any eigenvalue of satisfies .
- There exists an eigenvector of the eigenvalue , all the coordinates of which are greater
or equal than zero. Without loss of generality we can suppose that
the sum of its coordinates be .
It is now time to admire the power of this theorem. For this purpose, we will make the simplifying hypothesis that the matrix has a basis of eigenvectors and we suppose that is the vector of Frobenius theorem. For each , there exists such that . Let us take any nonzero vector with , where and . We decompose in the basis :
A technical proof that we skip allows to prove that . Now, let us calculate :
since is an eigenvector of the eigenvalue . And if we iterate we get
If all for satisfy , then
This is exactly what happens in our example!
Notice however that the theorem does not guarantee that any matrix satisfying the hypotheses of the theorem will have this property. Let us describe the possible pathologies and the remedy.
Possible pathologies.
- The eigenvalue may be a multiple root of the characteristic polynomial .
- The matrix may have other eigenvalues than , with modulus equal to .
What do we do in this case?
We call a Markov transition matrix with no pathology a regular Markov transition matrix, namely
Definition. A Markov transition matrix is regular if
- The eigenvalue is a simple root of the characteristic polynomial .
- All other eigenvalues of than have modulus less than .Remark that most matrices are regular. So, if the Markov transtion matrix of a web is not regular, then the strategy is to deform it slightly to a regular one.
Remedy. We consider the matrix , with for all . We replace the matrix of the web by the matrix
(1)
for some small . (The value has been used by Google.) Notice that the matrix still has nonnegative entries and that the sum of the entries of each column is still equal to , so it is still a Markov transition matrix. The following theorem guarantees that there exists some small for which we have fixed the pathologies.
Theorem Given any Markov transition matrix , there always exists a positive , as small as we wish, such that the matrix is regular. Let be the eigenvector of for the matrix , normalized so that the sum of its coordinates be . For the matrix , given any nonzero vector , where with and , then
(2)
Link with the Banach fixed point theorem
Another (coming) vignette treats the Banach fixed point theorem. The theorem above can be seen as a particular application of it. You can skip this section if you have not read the other vignette. Indeed,
Theorem Let be a regular Markov transition matrix. We consider , with an adequate distance between points (this distance depends on ). On , we consider the linear operator defined by . The operator is a contraction on , namely there exists such that for all ,
Then, there exists a unique vector such that .
Moreover, given any , we can define the sequence by induction, where . Then, .
Definition of the distance . The definition of the distance is a bit subtle and can be skipped. We include it for the sake of completeness for the reader who wants to push far the details. We limit ourselves to the case where the matrix is diagonalizable. Let be a basis of eigenvectors. Vectors can be written in the basis :
where . Then we define
With this distance, is a complete metric space, i.e. all Cauchy sequences converge.
This theorem not only guarantees the existence of , but gives a method to construct it, as the limit of the sequence . We have seen an illustration of this convergence in our example. Indeed, the -th column of is the vector , where is the -th vector of the canonical basis. Of course, in our example, we could also have found directly the vector by solving the system with matrix
We would have found that all solutions are of the form for . The solution whose sum of coordinates is is hence , where
Practical calculation of the stationary distribution
We have identified the simple idea underlying the algorithm. However finding the stationary distribution , i.e. an eigenvector of the eigenvalue for the matrix in (1), is not at all a trivial task when the matrix has billions of lines and columns: both the time of the computation and the memory space to achieve it represent real challenges. The usual method of Gauss elimination is useless in this case, both because of the size of the computation, and because it requires to divide by small coefficients. A more efficient algorithm makes use of the property (2) (see [LM]). This makes the link with the coming vignette on the Banach fixed point theorem which will enlighten that the proof of Banach fixed point theorem provides an algorithm to construct the fixed point.
Indeed, we start with such that
and we need to calculate . Usually, for some between and gives a pretty good approximation of . By induction, we calculate . Even such calculations are quite long. Indeed, due to its construction, the matrix in (1) has no zero entries. On the other hand, most of the entries of the matrix are zero. So, we must decompose the computation to take advantage of this fact, namely
Because of the special form of , it is easy to verify that if is a vector the sum of entries of which is , then . Hence, it suffices to calculate the sequence
Conclusion
We have presented the public part of Google’s PageRank algorithm. Already you can experiment with simple webs and find the tricks to improve the ranking of your personal page by adding internal and external links in an optimal way. Some private, more sophisticated parts continue to be developed. Some of them consist in replacing the ” neutral” matrix of (1) by matrices reflecting the taste of the surfer on the web. Others ensure that the ranking is not too sensitive to the manipulations of those who try to improve the ranking of their pages.
As general conclusion, what have we observed? A simple, clever idea has led to a immense breakthrough in the efficiency of search engines, and to the birth of a commercial empire. Even if the implementation is in itself a computational exploit, the initial idea required ” elementary” mathematics, namely linear algebra and probability theory. These relatively standard mathematical tools, in particular the diagonalization of matrices, have given their full power when they have been used outside of their “normal” context. Also, we have highlighted the unifying ideas inside science, with Banach fixed point theorem having applications so remote from its origin.
Bibliography
[E] M. Eisermann, Comment Google classe les pages webb, http://images.math.cnrs.fr/Comment-Google-classe-les-pages.html, 2009.
[LM] A. N. Langville and C. D. Meyer, A Survey of Eigenvector Methods
for Web Information Retrieval, SIAM Review, Volume 47, Issue 1,
(2005), pp. 135–161.
[RS] C. Rousseau and Y. Saint-Aubin, Mathematics and
technology, SUMAT Series, Springer-Verlag, 2008 (A French version of the book exists, published in the same series.)
This post is also available in: French, German, Italian, Spanish, Arabic, Khmer
Thanks for this piece, but I am still lacking the connection between google’s algorithm and the analysis above. Does the stationary distribution define the web-page’s ranking in a search?
How markov chain apply in pagerank ?
The next challenge is to expose high school students to this and other articles on Klein project blog website. A study dealing with such a challenge is on going in Israel in the past 2 years based upon a preliminary presentation at ESU5 (Movshovitz-Hadar 2007), and an action research presented at ESU6 (Amit and Movshovitz-Hadar 2011). We develop short, single-topic PowerPoint presentations named Math-News-Snapshots to be interwoven in the curriculum once a fortnight so that progress in the mandatory curriculum is not harmed yet students get the sense of what’s happening in contemporary mathematics. We will be glad to share more information about the study. Please send personal e-mail to: nitsa at technion dot ac dot il
After I initially commented I seem to have clicked the -Notify me when new comments are added- checkbox annd now ewch
time a comment is added I recieve 4 emails with the same comment.
Is there a waay you aare able to remove me from that service?
Appreciate it!
Your mode of describing all in this artile is actually good, all can without difficulty understand it, Thanks a lot.