Font Size: a A A

The Crawler Research Of Agent-based Topic-Specific Search Engine

Posted on:2008-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z C XuFull Text:PDF
GTID:2178360242988981Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the extensive application of WWW technology, the traditional search engines are facing enormous challenges for owning lower in a recall rate,retrieval accuracy and not timely update always.They can not express good demand for users and show the user's search results containing a substantial amount of information with unrelated to topic.At the same time, with an increasing number of different areas of customer, professional search engines able to providing much more efficient retrieval are needed in users' specific industry.Taking advantage the unique method of specific site acquisition, topic-specific acquisition and digging web structure, Topic-Specific Search Engine can promote the efficiency of the entire recall rate, retrieval accuracy, and higher guarantee in its timeliness and professionalism, also provide better personalized services. In that way, it can be highly effective in identifying the specific areas of information and providing a unique retrieval services. So the network crawler design is the core of search engine.This paper is main about the crawler research of Agent-based Topic-Specific Search Engine and related key technologies, the major work of this paper are:1. Based on the analysis of search engine technology, agent adaptive technology and machine learning, a Crawler Frame of Agent-based Topic-Specific Search is presented.2. A Chinese Segmentation algorithm with combining word library and statistics is presented .Taking advantage of improved VSM (vector space model) and the use of improved Methods of web mining and content mining .a automatic text classification algorithm is designed based on support vector machine .3. The search strategy is presented based on Q-learning algorithm, the algorithm is combining with web page evaluation technology and web link structure technology, with the help of adaptive agent, the algorithm can reduce the greed of the search to some extent and have a more effective manner to choose web links,which can be used for avoiding easily entering into certain local optimum subspace trap by the traditional heuristic search engine.4. CFATSS (Crawler Frame of Agent-based Topic-Specific Searches realized by the use of object-oriented Java language.Chinese segmentation is tested by inputting Peking University Corpus database , and website classification module and the search strategy algorithm based on Q-learning performance are also tested.The result shows that CFATSS has much more efficiency in Chinese segmentation, text classification ,the entire recall rate and retrieval accuracy.
Keywords/Search Tags:Agent, Chinese segmentation, text classification, Q-learning, Topic-Specific Search
PDF Full Text Request
Related items