Font Size: a A A

Research And Improvement On Link Analysis Based On HITS Algorithm

Posted on:2008-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2178360242967327Subject:Software engineering
Abstract/Summary:PDF Full Text Request
These years, as the technology develops, people can enjoy the abundant sources on the internet. Internet constructed based on huge volume of data and its complexity, extreme dynamic and all kinds of clients have made the internet source development difficult.Therefore, locating valuable information in the Web has become the important issue in the area of Web Data mining. The traditional method of information browser has been mature and under the circumstance, we mine huge linkage resource on the Web according to the attribute of it. Then we search and build the Web information retrieval model to find information we need.The current method of locating the right webpage is based on the hyperlink ranking algorithm. However, such method may cause the topic drift problem, which is the results of algorithm are often irrelevant with the searching topic, but has high link density.Due to the weakness of HITS, which only concerns the hyperlink between pages and neglect the content of the pages, the paper provides an idea about an improved algorithm called I-HITS. It is based on the theory of topic relevance and the popularity of the page. The I-HITS avoids the topic drift problem and it uses the relevance between the pages and the searching topics so that the importance of link could be distinguished. Therefore, a new matrix could be created and then use the new iterative formula to calculate the value of hub and authority.The paper also analyzes other improved algorithm based on HITS, such as ARC and SALSA. By comparing I-HITS and the traditional HITS as well as ARC and SALSA, I-HITS is able to find more pages with high relevance. The correct result could be improved by 30%-50%. Consequently, the method could lower the topic drift problem so that improving the efficiency and quality of the searching efforts.The paper gives a new method to create adjacent matrix using the relevance and the popularity to address the importance of the link from more objective angle. Therefore, the right page could be found to make the paper have theoretical and practical senses.
Keywords/Search Tags:Web Data Mining, Link Analysis, HITS, Relevance, Popularity
PDF Full Text Request
Related items