Font Size: a A A

The Research And Implementation Of Full-text Retrieval System Based On Lucene

Posted on:2009-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:M N LiuFull Text:PDF
GTID:2178360272991493Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and Internet technology, the digital information industry is growing at an ever increasing pace. In order to obtain the information that is needed from the vast amounts of data, people need an efficient search tool. Due to this trend, Full-text Retrieval technology has become more and more popular.Full-text Retrieval is an important branch of modern information retrieval technology, which is a powerful tool that deals with Nonstructural data, and it is also the core technology of the search engine. This paper introduces a Full-text Retrieval engine kit: Lucene, which is powerful, compact, and suitable for a variety of embedded applications. This year it has been widely used all over the world; many companies such as IBM use it to design core code. As open -source software, it provides us an excellent opportunity to learn the core technique of the full-text Retrieval engine, and it is very worthwhile to analyze the previous research and then improve the engine.This paper studies the prospective applications of Lucene in the field of the full-text Retrieval, through research and development to achieve a measurably improved digital product—Full-text Retrieval engine system. The main tasks included:1. Introducing the basic concepts and principles of the Full-text Retrieval system, analyzing the use of the commonly used classes of Lucene and as well as the characteristics of the open source tools DOM4J and Html Parser.2. Discovered the three cores of the search engines the Crawler, Indexer, and Searcher. The Crawler module uses the open source network crawler herit rix to realize the expansion of its network from crawling on the resources that is required.. Indexer and searcher use Lucene framework to achieve the Lucene Chinese word segmenter that is more effective than the self defined Chinese word segmentation, and also introduced Serialization and Java CC to improve the efficiency of both index and development. 3. In-depth analysis and implemented to achieve the use of Lucene to realize indexing and retrieval, the search results page, and other key priority computing technology.
Keywords/Search Tags:Search Engine, Full-text Retrival, Lucene
PDF Full Text Request
Related items