Font Size: a A A

Algorithm Research For WEB Structure Mining Based On Hyperlink

Posted on:2007-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y M JiangFull Text:PDF
GTID:2178360182977857Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recently, along with the quick popularization and development of the Internet and Web technology, it supplies people with abundant information. But the vast complicated and dynamic Internet information also make it very difficult for people to mine the Web resource. So it is a very important method to implement Web data mining by combining traditional data mining technology and Web.By studying the classical Web structure mining algorithm HITS and PageRank and considering that the HITS only calculates the hyperlink among the web and ignores the content of web result in the drawback of topic drift, we propose an improved HITS algorithm that combines hyperlink analysis and content analysis. The new algorithm improves the HITS by analyzing the content of the web and giving the hyperlinks with different weight. And experiment proves the new algorithm effective. Finally, for the algorithm HITS and PageRank will separate the page's authority from the page's hub or even ignore the page's hub, we discuss the personalized PageRank vector and the algorithm HubRank based on PageRank. And the experiment finally prove the HubRank is effective to the problem.
Keywords/Search Tags:WEB structure mining, hyperlink, HITS, PageRank
PDF Full Text Request
Related items