Research On Web Structure Mining

Posted on:2010-09-17

Degree:Master

Type:Thesis

Country:China

Candidate:J Liu

Full Text:PDF

GTID:2178330332988623

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the fast development of internet, it has become the largest database of information in the world. How to obtain required information or useful knowledge from such huge amount of information on the Internet has then become an emergent problem. It is an efficient way for web mining by combining traditional data mining techniques with web data characteristics. There are three main branches of web mining:content mining, structure mining and usage mining. Web structure mining is an important direction in web data mining. Researchers have discovered that much information is contained on link structure of web page. Hyperlink analysis has been successfully used in analyzing the hyperlink data of web pages to extract important pages.By comparing two classical web structure mining algorithms, HITS and PageRank algorithm, we study their characteristics in detail first. Then, for Page Rank algorithm used by Google, after its idea and calculation method are analyzed, some optimizing strategies for links within website, in-links and out-links are proposed. In Page Rank algorithm, each web page is considered to be equally important and is assigned the equal weight representing the authority or importance degree of the page. This is obviously unreasonable because web pages are different and have different importance. To overcome the shortcoming, the weights for different pages are assigned based on the numbers of their in-links, that is, the pages with higher in-degree are considered more important and assigned bigger weights. So we have to pay more attention to these pages. Finally, a network graph is set up by simulations, and results demonstrate the IPR is effective.

Keywords/Search Tags:

Web structure mining, link analysis, PageRank algorithm

PDF Full Text Request

Related items

1	Research Of The PageRank Algorithm In Web Structure Mining
2	Research On The Algorithms Of Web Structure Mining
3	Research On Search Engine Ranking Algorithm Based On Link Analysis
4	Research Of PageRank Algorithm In Web Structure Mining
5	Research And Improved Of PageRank Algorithm In Web Data Mining
6	Email Network Centricity Analysis Based On The Link Mining
7	Research On Link Structure Based Chinese Page Ranking Algorithm
8	Research Of WEB Structure Mining Technologies Based On Link Similarity Analysis
9	Web Page Sorting Algorithms Based On The Analysis Of The Linking Structure
10	Study Of Optimizing Algorithm For PageRanking Based On Temporal-Link-Analyse