Font Size: a A A

Web Page Sorting Algorithms Based On The Analysis Of The Linking Structure

Posted on:2011-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y M ZhangFull Text:PDF
GTID:2178330332488031Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with the rapid popularization and development of the Internet and Web technology, the amount of information on the Internet is expanding all the time. How to obtain the relevant information from the huge amount of information in the internet and sort them according to the requirement of user have been becoming more and more important, Search engine technology appears under this condition. And Web page sorting algorithm based on Web linking structure analysis is one of the most important techniques. The PageRank algorithm, the most widely used page sorting algorithm, is based on the Web linking structure analysis. From the viewpoint of mathematical model, PageRank algorithm can be viewed as Markov random surfing model, of which the transition probability between pages can be computed by the linking structure of current page. The final sorting value can be provided by the unique Markov chain's stationary probability distribution.By studying the classical Web structure mining algorithm HITS and PageRank algorithm, we can find that the classical PageRank algorithm can result in topic-drift, which is due to the fact that it assigns each outlink the same weight. Inspired by the experiments of PageRank algorithm and the idea of hub of HITS algorithm, the sorting function of PageRank algorithm is redefined, and it relates to the weights of the link to the indegree and outdegree of in-link pages. Then the Improved PageRank algorithm is proposed and it avoid assignning each outlink the same weight. The results of simulation show that the Improved PageRank algorithm has better performances than the classical PageRank algorithm does. The standard of Improved PageRank algorithm's index variable p@10 and p@50 are almost consistently higher than that of classical PageRank algorithm.It can be seen from the idea of PageRank in the Improved PageRank algorithm and classical PageRank algorithm that the value of the PageRank is influenced by three factors. For each link, indegree coefficient and outdegee coefficient are defined, which are evaluated by the three factors mentioned above. The Hybrid PageRank algorithm, which takes advantages of both the Classical PageRank algorithm and the Improved PageRank algorithm, uses the adjustable threshold to realize PageRank. The effectiveness of the algorithm is verified by the simulations.
Keywords/Search Tags:Hyperlink analysis, Page sorting algorithm, PageRank algorithm, Improved PageRank algorithm, Hybrid PageRank algorithm
PDF Full Text Request
Related items