Font Size: a A A

Research Of Focused Website Crawler

Posted on:2007-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:J Q LiuFull Text:PDF
GTID:2178360212958764Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the world wide web in recent years, information in the internet growth exponentially. This rapid growth poses unprecedented scaling challenges for general-purpose crawlers and search engines. More and more people hope to achieve the information they need quickly and effectively. Focused crawler is a subject-oriented information retrieval system. It can meet the users' need and retrieve information that is relevant to some specific subjects from the web automatically.Traditional focused crawler is targeting web pages that are relevant to some specific topics. But some applications, such as web directory, are providing users with relevant websites. With the rapid growth of the www, web directory which maintained manually are becoming more and more inefficient and unavailable. To implement a web directory which can maintain automatically, focused website crawler which retrieve only relevant website appeared.Focused website crawler is built on the foundation of a traditional focused crawler with website selecting and classification strategy. Starting from the seed website, the crawler picks out the best site follow the best-first rule and starts each crawling process. We design and implement a Chinese website oriented focused website crawler in this paper and introduce the function and principle of main components. Experiment demonstrate that the focused website crawler can retrieve relevant website effectively, and provides a solution to maintain the web directory automatically.Different from the traditional focused website crawler, the one introduced in this paper involves a modified external crawl strategy, which not only use the average weight of transversal link but also the number of transversal link to choose candidate website. Experiment demonstrates that the after modify the external crawl strategy, focused website crawler can visit authoritative website first and the accuracy of retrieving relevant website enhanced.
Keywords/Search Tags:Focused Website Crawler, Website Classification, Web Mining
PDF Full Text Request
Related items