Font Size: a A A

Research On HITS Algorithm In Web Structure Mining

Posted on:2009-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LuFull Text:PDF
GTID:2178360245989614Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Since 1990s, with the development of network technology, especially with the widespread application of internet, WWW has become a great, extensive global information service center. How to effectively help users to find out information they want or information resource they are interested in Internet to meet their demands has become an urgent problem needed to be solved. The Web data mining arises at the historic moment under this background.The Web existence indicates that there are massive humanity's latent meaning in links between Webs, including Web content relevance, information of quality and structure, which shows the importance and authoritativeness of Web page. Therefore we could use such link structure to find out authority page, that's how exactly the HITS algorithm use link structure to mine Web data.This thesis is mainly about the HITS algorithm research. In all related algorithms carrying on link analysis and extracting group, HITS (Hyperlink-Induced Topic Search) is a kind of afterwards parsing algorithm that most widely used. At present there are many related applications in Web structure mining system. The thesis first introduces relative knowledge of Web data mining, specially discusses the theory of Web structure mining, analyzes HITS algorithm, and conducts deep research for its advantages and disadvantages. Then we analyze the variation of HITS algorithm: Spatial Vector Projection .The theory base of Space Vector Projection HITS is that full trust the authoritativeness of root collection. Unlike HITS only calculates the host characteristic vector, the Spatial Vector Projection algorithm calculates each characteristic vector. It projects all characteristics vectors to the root collection space, and revises the algorithm by comparing with each one. The thesis proposed an improved Web structure mining algorithm--VSM Space Projects HITS algorithm. Through VSM extracting textcontent and synthesizing Web content and Web links, we find out a more reasonable and more trustworthy spatial vector.Finally, aiming at the three above algorithms we have carried on a series of experiments. The experiments indicate that VSM Spatial Projects HITS algorithm can more effectively suppress subject drifting phenomenon comparing to HITS and Spatial Vector Projection HITS.
Keywords/Search Tags:Web Structure Mining, Hypertext-Induced Topic Search (HITS), Vector Space Model (VSM)
PDF Full Text Request
Related items