Font Size: a A A

Research On Topic-Specific Search Engines

Posted on:2006-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:D S LiFull Text:PDF
GTID:2168360155968524Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
This article summarizes the origin and development of Web, elicits the problem of network searchers and studies the current main search engines. Due to the current large size and dynamic of the Web, gerneral purpose search engines can no longer provide a comprehensive and up-to-date search service of the Web. In contrast to gerneral purpose search engines, which attempt to index the whole Web, Topic-Specific search engines can cover specialized topics in more depth and keep the crawl more freshly. The design and implementation of a Topic-Specific search engine is going through a highly creative phase. A lot of machine learning work is being applied to the task.Based on survey of the state-of-the-art in Topic-Specific search engines, this thesis grasps how to design good crawling strategy and how to score target pages relevant to the specific topic, and provides a Topic-Specific Web Searcher (TSWS). Next, this paper presents the design and implementation of TSWS, which includes HTML Parser, Topic-Specific Web Crawler and Web Classifier. In designing HTML Parser, this paper realizes the fast index table for hyper link and considers the robustness. In designing the Topic-Specific Web Crawler, this paper makes use of heuristic ways to do relevance predication to the URLs to be gathered, together with the relevance scoring to the pages gathered, which increases the rate and efficiency of gathering. In designing Web Classifier, this thesis makes use of Vector Space Model to represent the web text, which improves the performance of Bayes Classifier. The paper also introduces the development of link-based retrieval techniques for Web information retrieval.
Keywords/Search Tags:Topic-Specific search engines, machine learning, Parser, Crawler, Bayes Classifier
PDF Full Text Request
Related items