Font Size: a A A

Study Of Optimizing Algorithm For PageRanking Based On Temporal-Link-Analyse

Posted on:2009-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2178360242497766Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Web structure mining tries to discover the knowledge from the link structure of WWW of the hyperlinks,Web document and interlinkage at the inter-document level.At present there are mainly two mining algorithms based on the web's link structure and interlinkage,and the typical one of which is PageRank designed by Larry Page.etc.After analyzing these algorithms of ranking search results like PageRank,HITS and TimedPagrank which are based on the link structure,one defect found is that the traditional rank algorithms favour the old pages which made some old pages listed on the top of the searching results.So the Temporal-Link-Analyse technology is introduced into this paper,using the last-modified timestamp responsed by the HTTP protocol when spider crawling the web as the timestamp of the pages and links.The new improved algorithm WTPR can make the new pages ascend its rank in the result,while the old pages with high quality get higer rank values than common old pages.The main contributions of the paper are following:First the Web structure mining is briefly introduced,and the principle and the related definition of the Web Link-analyse are presented in detail.Then the Web link-analyse's state of the art and its major contribution is studied,which makes a good foundation to introduce the pageranking algorithm based on the link-analyse.Secondly for the limitation of PageRank,the Temporal-Link-Analyse technology is introduced in order to improve it.The last modified time of the pages, which is responsed by the HTTP protocol when the spider Websphnix is crawling the web,is used as the age of the pages on the web,and on this basis to mine the web link structure,link quality and the time serial.Based on the web age the pageranking algorithm Age-WPR is designed,also the experiment is verified.Furthermore for the fact that the static page age can not meet the dynamic changes of the web and the web pages' uncertainty,the paper proposed the definition of interest interval,and give a detailed definition of the nodes and links' timestamp in the dynamic web environment.Based on this the novelty value is proposed in order to differentiate the old and new pages.Then combined with the link quality the algorithm WTPR is designed,which overcomes the deficiency of the link-analyse at present.Lastly the design and implementation of the pageranking system in java is introduced,and the general computation step is given accordingly.Finally the improved Pageranking algorithm is verified through testing the web pags snapshot. And the weight factors of the WTPR are therefore determined.The experiment improved that the optimizing strategy used by the WTPR algorithm developed can make the old pages decline and new pages rose in the ranking result,while the old pages of high-quality get higher rank value than common old pages.
Keywords/Search Tags:Web Data Mining, Web Link-Analyse, PageRank Algorithm, Interest Value, WTPR Algorithm
PDF Full Text Request
Related items