Font Size: a A A

Research Of PageRank Algorithm In Web Structure Mining

Posted on:2010-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:C X FanFull Text:PDF
GTID:2178360275459250Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development of web information technology,the users can acquire all kinds of information conveniently.At the same time they also face the problem which is how to get relevant and useful information from web.Although it can greatly reduce obstacle of useless information by the use of traditional search engine such as Google,baidu,Lycos,etc,the search results are sometimes imperfect and irrelevance. Luckily,the current Web data mining technology can solve the problem of excessive information via the web hyperlink structure analysis for users providing more accurate and relevant data.Web data mining gradually becomes a hot topic.In this paper,an algorithm based on combining the distance among the hyperlink pages and reinforcement learning—DisRank is proposed.The algorithm is based on the thorough study of PageRank through the typical web structure mining.However,the PageRank only considers the link relationship among web pages and ignores web page content of text itself which causes the problem of"theme drift"due to labeling the high weight value to the high authority web pages and endowing the low authority with new pages.Our algorithm computes the rank of web pages and sorts them which are based on the reinforcement learning and the distance between pages considered as"punishment". Firstly,the paper grabbed a certain number of pages as training samples based on certain topics through the web crawling algorithm.Secondly,the pages are stored in a database.At last,by called PageRank algorithm and the improved algorithm DisRank experiment respectively,it proves the validity of the improved algorithm.Our works include the improved algorithm DisRank grabed related web throughput,algorithm throughput with different values of theβ,precision,convergence speed and algorithm time complexity etc.Finally,this paper summarized the work and put forward the improved algorithm which needs perfection.It also gave the direction of further research work.
Keywords/Search Tags:Data Mining, Web Data Mining, Hyperlink analysis, PageRank, HITS
PDF Full Text Request
Related items