Font Size: a A A

Research On Web Structure Mining

Posted on:2010-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2178330332988623Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the fast development of internet, it has become the largest database of information in the world. How to obtain required information or useful knowledge from such huge amount of information on the Internet has then become an emergent problem. It is an efficient way for web mining by combining traditional data mining techniques with web data characteristics. There are three main branches of web mining:content mining, structure mining and usage mining. Web structure mining is an important direction in web data mining. Researchers have discovered that much information is contained on link structure of web page. Hyperlink analysis has been successfully used in analyzing the hyperlink data of web pages to extract important pages.By comparing two classical web structure mining algorithms, HITS and PageRank algorithm, we study their characteristics in detail first. Then, for Page Rank algorithm used by Google, after its idea and calculation method are analyzed, some optimizing strategies for links within website, in-links and out-links are proposed. In Page Rank algorithm, each web page is considered to be equally important and is assigned the equal weight representing the authority or importance degree of the page. This is obviously unreasonable because web pages are different and have different importance. To overcome the shortcoming, the weights for different pages are assigned based on the numbers of their in-links, that is, the pages with higher in-degree are considered more important and assigned bigger weights. So we have to pay more attention to these pages. Finally, a network graph is set up by simulations, and results demonstrate the IPR is effective.
Keywords/Search Tags:Web structure mining, link analysis, PageRank algorithm
PDF Full Text Request
Related items