Font Size: a A A

The Research And Implementation Of Search Engine Based On LUCENE

Posted on:2008-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:L GaoFull Text:PDF
GTID:2178360215474224Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the information technology unceasing development, the Internet technology is also developing swiftly, but the most high frequency tool which everybody uses every day on the Internet is the search engine, the people already treated it as an essential tool for study, work,the leisure activities. Everybody knows with the search engine one may get the material or information that he wants to find, and then what is the search engine? Genarally we referred the search engine on the Internet as it has collected from several billions to 10 billions web pages, and index each word(namely key word) of the whole webpages, established the full-text search engine of the index database. After the user entering the key word, all the pages containing the key words would be find out as the search results. After sorting according to complex algorithm, these results will be presented to the users based on the correlation degree to the key words.First of all,the thesis introduces present situation of the development of search engine. After 1990's, when facing vast network information resources, it become more and more difficult for people to seek information they need in the process of informationization based on the Internet. The majorities will rely on the search engine to help themselves to obtain the useful information to a great extend. Therefore,the development of the search engine technologies as a typical web information accessing technology will have directly impact on the quality of people access to information. In the next place, we introduced the search engine characteristics and classification , have a discussion on search engine principles and Robot,analyze and study on the architecture of the google search engine .In this foundation,we have elaborated on the open source code project Lucene, its history, application, characteristics ,system structure, the Lucene index format .Then,we have study on several key technologies. Because web pages frequently updated, along with time passed, some many pages would be obsolete or do not exist. Through the analysis on process of the robot's fetching webpages, we proposed the robot's increment Page Change Model . Finally, we have discussed on the common algorithms on Chinese Word Segmentation , the ambiguity of Chinese Word segmentation and unregistered words.
Keywords/Search Tags:Search Engine, Lucene, Robot, Chinese Word Segmentation
PDF Full Text Request
Related items