Font Size: a A A

Research And Implementation Of Search Algorithm And Search Engine

Posted on:2008-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:L GaoFull Text:PDF
GTID:2178360215973767Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapidly development of the Internet and Information, how to acquire the knowledge that need for us form the information sea is becoming a problem need to be resolved urgently. Hence for satisfy the Information Index to search engine technique and then emerge with the tide of the times demandingly. Recently there are some search engines, for instance Google, the Baidu etc., but the search engine technical is just beginning and there are many problems waiting for resolving. One problem is how to raise the Recall of a search engine, as we know, the Hidden Web and Dynamic Web Page is becoming an import part of today's Internet, and how to obtain this kind of information becomes a new topic for the Search Engine Network Robot (WebCrawler) .Another problem is the Precision of a search engine. The resolves of this problem is mainly depend on raising the accuracy of Word Split.Two standards of evaluating a search engine are Recall and Precision. This thesis will start with a basic structure of a search engine, and then analyze the composition and the principle of each part, and design an expansive search engine.This thesis will begin with the Word Split module, after studying the basic steps of the Word Split module and demonstrating the usage of statistics method in the early and further word splitting, This thesis put forward a new standpoint that introduce the diagram method into the deeply word cutting layer. In the mean time, After drawing lessons from the foundations of the traditional participle model, This thesis creatively put forward a new method that identify the Unlogging words before analyzing the sentence. We will discuss this structure in chapter 3, and then introduce some basic arithmetic includes the MM arithmetic, the RMM arithmetic, the best match arithmetic, N-gram arithmetic, and put forward new word split arithmetic: 3-connection path arithmetic.Secondly, this thesis made thorough analysis to the WebCrawler, studying the three mold pieces that constitute the WebCrawler which is protocol module, processing module and strategy module. Studied the MD5 arithmetic and the PageRank strategy will be the next step. At the end, this thesis designs a viable WebCrawler.Later on, this thesis studied the basic mold that compose a search engine which is The Text analyze module, and introduce the index machine and Text analyze module.Finally, this thesis described the method to the integration of the above mentioned system, and provided a viable frame for the research and development of the search engine.
Keywords/Search Tags:Search Engine, Information Index, WebCrawler, Word Split
PDF Full Text Request
Related items