Font Size: a A A

Design And Realize Of Spider In Vertical Search Engine

Posted on:2008-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:J C XueFull Text:PDF
GTID:2178360212484233Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, web has become the largest data base in the present world, which provides an ideal place for sharing and communicating infor-mation. However, the large amount of website resources and their dynamic characteris-tics require continual update of the data-searching system, as well as higher level of ef-ficiency, pertinence and accuracy in searching data. Therefore, various specialty-based searching engines have been invented. How to get access to useful information on the net more quickly and more correctly is one of the problems which web surfers face, while the technology of searching engine which consists of Spider,Indexer,Searcher and User interface system is the key to solve this problem. The spider aims at producing intelligent searching software which can automatically search information on the web for selecting the useful information, and at setting up a local index data base for the searching service to users. The vertical searching engine is a typical type of searching engine, which can classify information in certain field from those websites, select nec-essary data string by string along one direction, analyze those data and then return them to the user. The major difference between vertical searching engine and traditional searching engine is that the vertical one select information from website in a structural way– classify the information while selecting it to better satisfy the searching require-ments.The paper has analyzed and discussed the research and development of WWW searching engine technology in details, and its current situation as well as future trend in mainland and abroad. It also states the working theory of searching engine and the main function of each component. Firstly the paper emphasizes how to evaluate the subject pertinence of web page and designing efficient searching strategy as two key steps. Then it also describes a fixed-subject searching engine basing on the specialty of book, which is the core of vertical searching engine. The main part of the paper covers the whole procedure of designing the engine. Basing on the general conceptions of analytic HTML, combined with the analysis of hyperlinks between web pages(HIT al-gorithm), according to the requirements for searching engine, the paper has designed a web spider (with depth-preferred searching strategy) fitting for middle or small sized websites'information selection. The Searching arithmetic of the web spider has been presented and it can work with the aid of C++ Builder tools for better satisfying searching engine users. Besides, to avoid repetition of data, a program specified in checking the data repetition has been designed to guarantee the accuracy of data. Bas-ing on these principles the searching engine is set up by data index and searching tool Lucene to composite the searching result in guarantee of offering accurate information and better satisfying users'requirements. In general, this searching method is guidance for setting up other specified searching engine systems.The results of software function test show that the algorithm of Spider program is accurate and steady without the risk of local information resource exhaustion. It sup-ports the searching strategy of searching on fixed site or in a given Url circle. It can also do automatic searching and downloading according to the given information.
Keywords/Search Tags:searching engine, Web Spider, information collection, searching strategy
PDF Full Text Request
Related items