Font Size: a A A

Hits Algorithm On Web Data Mining Research

Posted on:2005-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:J Y HuangFull Text:PDF
GTID:2208360122497265Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Internet is a huge, widely distributed and global information service center, which provides various kinds of information services. Meanwhile, how to obtain required information or useful knowledge from the great deal of information provided by Internet has then become a problem required to be solved at once.It is a very important method to implement Web data mining by combining traditional data mining technology and Web. First, an overview of data mining technology used in Web is given in this thesis, including its classification, technology, development, future, research direction and its application in search engines, as well as the changes and chances in Web mining brought by XML.In Web structure mining, hyperlink analysis has been successfully used in analyzing the hyperlink data of web pages to extract authoritative information sources. Among various hyperlink analysis methods, HITS (Hyperlink-Induced Topic Search) algorithm is used most widely. In the following part of this thesis, HITS algorithm is discussed and based on the experiments, the topic drift problem of HITS algorithm is also analyzed. Then root-set eigenvector projection method and base-set downsizing method are implemented to improve the HITS algorithm. Based on projection method, weighed root-set eigenvector projection method and weighed base-set eigenvector projection method are also proposed in this thesis to make deeper improvement so that the search of authoritative web pages can be more effectively.By comparing the experiments of these improved HITS algorithms and traditional HITS algorithm, it can be seen that root-set eigenvector projection method can effectively avoid topic drift problems; base-set downsizing method can greatly decreased the computational cost; weighed root-set eigenvector projection method and weighed base-set eigenvector projection method can not only make the results of extracting authoritative pages more reasonable, but can also improve the agility of HITS algorithm.
Keywords/Search Tags:Web data mining, Authorities, Hubs, HITS algorithm, Root-set eigenvector projection method, Base-set downsizing method, Weighed root-set eigenvector projection method, Weighed base-set eigenvector projection method
PDF Full Text Request
Related items