Font Size: a A A

Study Of The Hyperlink Analysis Algorithms Based On Web Information Retrieval

Posted on:2009-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:K Y DengFull Text:PDF
GTID:2178360245453596Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, Web has become an important way to find information. Since the online information keeps increasing rapidly, how to improve the effectiveness and efficiency of information retrieval has become the most challenging problem for search engines.The World Wide Web serves as a huge, widely distributed, global information service center, and expanding in a rapid speed. The emergence of WWW introduced new challenges to the traditional information retrieval technologies. There is the least evolvement in traditional information retrieval technology. From the characteristics of the Web data, hyperlinks among the Web pages can be used to mine more useful information. Searching with the hyperlinks can create more effective Web information retrieval model which can help us search the useful information. Hyperlink structure playes an important role in a lot of research fields of WWW. This paper will introduce the application of hyperlink structure on Web information retrieval.In recent years, researchers discover that rich and import information is contained among hyperlinks, and develop a lot of algorithms using hyperlink to improve the quantity and relevance of the results which search engine returned. Based on the idea of hyperlink, the PageRank algorithm was proposed by Sergey Brin and Lawrence Page in 1998[1]. Google search engine keeps on top by its PageRank and convergence algorithm. But the convergence algorithm is more important. The convergence algorithm decides the time and space cost in the procedure of evaluating PageRank vector straightly. A good convergence algorithm can make the system use the least time and space cost evaluating the final PageRank vector. So that the total performance of the search engine will be improved. Under the current circumstances, the PageRank matrix is huge and traditional matrix theory can not be directly applied to the calculation of PageRank. Therefore, it is an urgent need to research the method of improving the speed of convergence and PageRank computational efficiency according to the particularities of PageRank transfer matrix eigenvalue.This dissertation works on accelerating the PageRank computation. At first, the model of PageRank and the power method are briefly introduced. Based on the existing research for the computation of PageRank, we further derive the general formula for the acceleration of PageRank computations. And we also discuss the method for generating high dimension stochastic matrix, being characterized the Web graph. And at the same time we present two novel algorithms for accelerating the PageRank convergence: General Extrapolation and Acceleration Extrapolation. Then we give the results of the numerical experiments performed on four datasets. The numerical results confirm the effectiveness of our theoretical analysis and numerical algorithms. By comparison with the original method, the results demonstrate that the new algorithms provide a solid speedup.
Keywords/Search Tags:Web Information Retrieval, PageRank, Hyperlink Analysis, Dominant Eigenvector
PDF Full Text Request
Related items