Font Size: a A A

The Topic The Search Engine Design And Research

Posted on:2008-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Q G LiuFull Text:PDF
GTID:2208360212975416Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Due to the current large scale and dynamic of the WWW, general purpose search engine can no longer provide a comprehensive and up-to-date search service of the Web. In contrast to general purpose search engine, which attempt to index the whole Web, Topic-Specific search engines can cover specialized topics in more depth and keep the crawl more freshly. The research on a Topic-Specific search engine is going to be a popular task.This paper gives a general introduce of the history, classification, development and the trend of the search engine at the first; Then make the research work based on the system structure and the principle of work of search engine. The technology on the crawler, HTML parser, and Chinese words segmentation is discussed and analyzed separately. Also gives analysis on the page link and the specific of the distribution of topic pages.The relativity judging system between the page content and the topic is designed and implemented in this paper. For system structure is based on the open software 'Nutch', the character of robust and convenient is good maintenance. In this work its core thinking is: give the weight on the keywords, and make the judgment through the relativity judging system between the page content and the topic.vector space model and keywords play an important role in the judging system.A novel concept is proposed in this paper: TheΩ—distance between keywords and page. Vividly speaking theΩ—distance 'do the job' like a resistance in a circuit. It separates or pulls apart the WWW information reciprocal, in other words the faithful purpose expressed by the keywords can not match the relevant page content. This concept has a unique value on practical utility in the Topic-Specific search, it can improve the intelligence on information match effectively, but it still need further research work.This paper shows some meaningful research and approach work, the further research on this filed can develop by the groundwork which is completed.
Keywords/Search Tags:Topic-Specific search, VSM, topic judgement, keywords set
PDF Full Text Request
Related items