Font Size: a A A

Reptile Theme System Based On Incremental Feedback And Adaptive Mechanism Design And Realization

Posted on:2006-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2208360155459028Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The enormous growth of the world wide web in recent years has made it important to perform resource discovery efficiently. Consequently, several new ideas have been proposed in recent years; among them a key technique is focused crawling which is able to crawl particular topical portions of the world wide web quickly without having to explore all web pages. And now, it is more and more widely applied in the fields of topic-specific search engines, site structure analyzing and so on.The major research work and contributions of this dissertation are as follows:(1) The basic theory and the construction of the focused crawler are investigated respectively. Based on these investigations, the thesis explores the related techniques of the focused crawler and brings forward a structure design model of it, which was named HJSpider.(2) In the course of the relativity judging between the page content and the topic, we applied the term-based vector space model which is widely used in the filed of the text classification.(3) In the course of the relativity judging between the URL and the topic, we developed a new arithmetic which based on the page content, the web structure and the hyperlink analysis method HITS.(3) We summed up the rules of the distribution of topic on the web, and described the way how to select the topic and how to analysis the hyperlink based on the HTML syntax.
Keywords/Search Tags:Focused Crawling, HITS, VSM, Hyperlink Analysis
PDF Full Text Request
Related items