Font Size: a A A

Research Of Link Analysis Algorithm Based On Gravitation Model

Posted on:2008-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:L G ZhangFull Text:PDF
GTID:2178360242467271Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The World Wide Web is a hypertext body of approximately 800 million pages that continues to grow at roughly seven million web pages everyday. Faced with such a constantly changing information resource, how to find and make use of web useful information becomes tremendously challenging. Existing search engine technologies are far from satisfying users. The web is fundamentally a fractal structure showing self-organized and semi-structured characteristics, which presents difficulties for traditional information retrieval technologies. Link analysis can significantly improve the relevance of the search results in Web Information Retrieval field.Kleinberg's HITS algorithm of is one of the classical algorithms. Most of the existing link analysis algorithms use links as the main factor to determine the importance of a web page, which can not reflect complicated relationships among web pages. The algorithms based on Link Analysis are better than traditional text based algorithms, but they don't consider the contents of web pages and always lead to some problems. For example, HITS algorithm always converges to a tightly knit community which is a small but highly interconnected set of pages and it is TKC effect. The pages in the TKC are not authoritative on the topic, or pertain to just one aspect of the topic. The special case of it is the topic drift.This paper tries to interpret link analysis from the perspective of physics. It views the "endorsement" of one page links to another as "attractive force". In this way, G-HITS (Gravitation-Based HITS) is proposed, which models web pages, content similarity and link relationship using some concepts of physics. Specifically, web pages are modeled as particles, content similarity of a web page to the query topic is modeled as distance between them, and the relationship of two pages that one links to the other is modeled as the attractive force. Each element of the adjacency matrix derived from the query-specific link graph will be the value of the attractive forces and can be computed following Newton's theory of gravitation. Then the iteration will be performed to get authorities and hubs. Experimental results show that G-HITS is more reasonable to interpret link analysis problems; it also identifies higher quality authorities, improves the authorities results 30%, and is more resistant to TKC effect than other typical link analysis algorithms.
Keywords/Search Tags:Link Analysis, Gravitation Model, Information Retrieval, G-HITS Algorithm
PDF Full Text Request
Related items