Font Size: a A A

The Design And Application Of Chinese Intelligent Search Engine

Posted on:2001-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q X GaoFull Text:PDF
GTID:2168360002952428Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the development and popularization of Internet, search engine becomes anessential tool for Internet users. This paper analyses the features and currentresearch status of search engine domestic or overseas, and points out the necessityand importance of the research of Chinese intelligent search engine.This paper introduces systematically the design and development of "ChinaInfo"search engine, and uncovers the secret of how search engine works.a"China Info"search engine is a multipurpose and adjustable Chinese intelligentsearch engine. With the Browser/Server architecture, it improves its intelligence viacooperation between client and server. It also improves search performance vianatural language processing in contents of web pages."China Info"search engine consists of distributed parallel spider, whole-lengthsearch database, intelligent information processing model,CGI and smart browser, etc. It supports whole-length search, concept search basedon language database and concept search based on knowledge database.Here, the design and development of "China Info"spider is the emphasis ofthis paper.Spider is the data source of Internet search engine. It decides whether thecontents are abundance and the update of information is in time. "China Info"spideris a distributed parallel system with Client/Server architechture. It is composed ofTask Manager(TM), the server program and Gather Agent(GA), the client program.TM is a program based on TCP/IP protocol , using Visual C++ as developmenttool. It achieves the goals: 1) communicate with GA via TCP/IP protocol(Socket)and communication primitive, and manipulate the GA Host list; 2)responsible fordispatching of search tasks, sending search task to a GA lowed load or unload;3)control the search strategy and communicate with users.GA is implemented with multi-thread technology, its main function includes:1)communicate with TM via TCP protocol(sockets) and communicate principles,and report its status.2)receive the search task from TM , namely ROOT-URL list. 3)use breadth-first strategy to get Web pages information. 4)gather web pages andsave in database in proper form.
Keywords/Search Tags:WWW technology, Search engine, Artificial intelligence, Spider, ODBC, Multithread
PDF Full Text Request
Related items