Font Size: a A A

The Research And Implementation Of Full-text Retrieval System Based On Lucene

Posted on:2011-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:X GaoFull Text:PDF
GTID:2178360305968125Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, the amount of information has been rising in unimaginable speed. To find information that we really need, we need an efficient search tool, so Full-text Retrieval becomes the hot topic. It has very strong availability for information-integration.The main work of this paper has four aspects.First, most of early Full-text Retrieval systems were based on relational database. With the purpose of deeply talking the disadvantage of this method, we propose a method which based on file system in this thesis. Second, this paper introduces a Full-text Retrieval engine kit-Lucene, which can only retrieve English and German, no Chinese. So if you develop a Full-text Retrieval system which is based on Lucene, it must have the module of retrieval Chinese.This paper realizes search in Chinese. Third, this paper uses Levenshtein edit distance and Jaro-Winkler distance to compute the similarity of English words respectively. Fourth, the core of Lucene has been designed very smart, it was limited to the processing text format. With the purpose of retrieving a variety of file formats, the module which can process multiple formats was established, such as word, excel, power point and pdf.Struts is adopted to build system platform, MVC makes program clear, it also makes very clear division of labor, system maintenance and expansion easier.As for the application aspect, this paper focuses on the retrieving of different formats documents. Its retrieval subsystem realized constructing indexer, database memory design and searcher design on the basis of relative work such as document data process, information extracting. Finally, the system realized Full-text Retrieval. As for recall and precision, the system accomplished primal design target on the whole.
Keywords/Search Tags:Lucene, Full-text Retrieval, struts' framework, document analysis
PDF Full Text Request
Related items