Font Size: a A A

Improvements Of HITS Algorithm Based On Triadic Closure

Posted on:2015-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:X X TianFull Text:PDF
GTID:2298330431494348Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In the21st century, the Internet and mobile terminals become the fastest growingtechnology products and with the expansion of the scale and the popularity of the product, theInternet and intelligent terminal fundamentally affects the way of people live, work, leisure,communication. The Internet has become a largest repository of information which canprovide very large network resources. But the data in the Internet is large, complex anddynamic. These features cause considerable difficulty to web data mining. Therefore, how tohelp users find resources which they are interested from the vast cyber source has become anurgent problem.Hyperlink is the important portion of network information. The whole network can belinked together by using the hyperlinks. The potential semantic contained in hyperlink isconcise. It is the author’s way to hint the subject of the linked document. Besides, hyperlinksprovide many of information about the quality, structural and relevance of content. Thisprovides an important resource for web mining. The HITS algorithm made use of the linkstructure of the web network to identify authoritative web pages.This paper makes in-depth and careful studies on HITS algorithm and pertinentimproved algorithms. HITS algorithm only took account of the hyperlink structure andcompletely excluded contents of web pages. Moreover, it ignored the fact that degrees of theimportance of many links may be different. Therefore, this algorithm will lead to topic drifts.To solve this problem we propose three improved HITS algorithms based on the theory oftriadic closure, VSM and the TrustRank algorithm. They respectively are PCHITS algorithm,PAHITS algorithm, PCTHITS algorithm. Firstly three new concepts are given in this paper.they respectively are page topic similarity, common reference degree and trust-degree. Then,computing the relevance between arbitrary two pages based on these concepts. Finally, byusing the relevance, a new adjacency matrix is constructed to iteratively calculate authoritiesand hubs.This paper gives a new method to construct adjacent matrix that is using the page topicsimilarity, common reference degree and trust-degree to weighted link so as to have moreobjective measure the importance of the link. This method provides a guarantee fordiscovering and ranking pages relevant to a particular topic. Therefore, this article has acertain theoretical and practical significance.
Keywords/Search Tags:HITS algorithm, triadic closure, trust-degree, VSM
PDF Full Text Request
Related items