NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
ERIC Number: ED516370
Record Type: Non-Journal
Publication Date: 2010
Pages: 114
Abstractor: As Provided
ISBN: ISBN-978-1-1097-7837-3
ISSN: N/A
EISSN: N/A
Computing Principal Eigenvectors of Large Web Graphs: Algorithms and Accelerations Related to PageRank and HITS
Nagasinghe, Iranga
ProQuest LLC, Ph.D. Dissertation, Southern Methodist University
This thesis investigates and develops a few acceleration techniques for the search engine algorithms used in PageRank and HITS computations. PageRank and HITS methods are two highly successful applications of modern Linear Algebra in computer science and engineering. They constitute the essential technologies accounted for the immense growth and success of popular search engines present today, especially Google. PageRank is a ranking method for the web pages based on the hyperlink structure of the world wide web. It is modelled as an eigenvalue problem, where the matrix related to the world wide web is huge, real, and nonsymmetric. Iterative procedures are employed to compute the principal eigenvector of the eigenproblem. The output upon convergence is a vector carrying the ranks of the web pages. The standard Power method is used for the iteration by the founders of PageRank. As is well-known, the Power method suffers from slow convergence if the gap-ratio of the eigenproblem is close to 1. To accelerate the convergence when the Power method converges slowly, we investigate a Rayleigh-Ritz based method, which we call the Rayleigh-SVD method, and an extrapolation method. For the Rayleigh-SVD method, we compute a singular vector related to the Rayleigh quotient that can contribute to faster convergence, instead of performing an eigen decomposition as in the standard Rayleigh-Ritz method. For the extrapolation based method, we truncate the terms that hinder the convergence of the Power method as a means for acceleration. A novel technique we develop is the combination of the Rayleigh-SVD method and the extrapolation method. Numerical experiments are presented to illustrate the speed up in convergence for the Rayleigh-SVD method, the extrapolation method, and a combination of the two. HITS is another web ranking method similar to the idea of the PageRank model. It utilizes two vectors, the hub vector and the authority vector, instead of just one PageRank vector. Once again the Power method is the standard iterative procedure used in the computation of the HITS vectors. To accelerate the potentially slow convergence of the Power method, we develop a filtering method based on Chebyshev polynomials. The two matrices involved in the HITS model are real and symmetric, therefore real Chebyshev polynomials of the first kind are particularly useful and effective for the acceleration. We present numerical results that show the advantage of the filtered method over the Power method. Several realistic web graphs by web crawlers (UbiCrawler) are used as test matrices for our algorithms. For the PageRank method, the nonsymmetric matrices are directly used to construct the PageRank model, while for the HITS method, symmetric matrices derived from the web graphs are used. Although the acceleration algorithms developed in this thesis are applied to search engine related models, they are applicable in many other fields where efficient computation of eigenvalues and eigenvectors of large matrices is necessary. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A