Font Size: a A A

Research On Web Hyperlink Analysis And Its Application In Search Engine

Posted on:2009-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:K Q LvFull Text:PDF
GTID:2178360245499997Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the greatly rapid growth and fleet popularization of Internet, The information we could get from the Web is also explosively expanding. Since people can not browse the entire Web, they refer themselves to Web search engine to find what they need. As Web hyperlink analysis technology could enhance the precision of the search engine, it soon became a hot topic of the researching field of net application and information retrieval.First of all we introduce the classification of the search engine, its history of development, working theory and evaluation of the performance. Then we summarize the important function of the hyperlink analysis in the history of search engine. After expounding the most famous hyperlink analysis algorithm PageRank and HITS, we discuss the principle and problems of them, finding that topic drift phenomenon is the most serious problem. Compared with HITS, PageRank is more stable and applicable for large scale search engine. We try to improve PageRank. After studying and sum up all the improving methods of the domestic and overseas, we propose two new methods to improve PageRank from different aspect.Analyzing the functions and creating motivations of hyperlinks we find that hyperlinks are totally different. Enlightened by the idea of text classification, we bring in the concept of hyperlink classification. Based on the principle that different class is given different weight, we propose a new way to improve PageRank based on the hyperlink classification. To validate the HC-PageRank, we write a program on the Nutch and develop a new link analysis tool. The experiment result shows that the precision of HC-PageRank is higher than the traditional one.Studying the computing process of PageRank, we find the PageRank value has no semantic meaning. According to the HITS online clustering way, we modify the value of PageRank according with hyperlink anchor text. Then we propose online clustering way to improve PageRank. At last we develop online clustering plug-in software to on Nutch. The experiment result shows that the precision of online clustering way is higher than the traditional one.
Keywords/Search Tags:Search Engine, Hyperlink Analysis, PageRank, Hyperlink Classification, Hyperlink Content
PDF Full Text Request
Related items