Font Size: a A A

The Research And Implementation Of Full-Text Search Engine Based On Lucene

Posted on:2017-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:R J HeFull Text:PDF
GTID:2348330503468247Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid growth of digital information and the popularity of the network, People can get vast amounts of information through the Internet without leaving home. At the same time, with the development of economy, enterprise becoming global, which makes the competition between enterprises more fierce. In the new era of rapid development in technology. The large commercial search engine companies like Google, Baidu and Yahoo!have realize the function of searching internal company information. But due to the particularity of enterprise search engine, which makes commercial and popular search engine not suitable for searching sensitive data in enterprise. So, how to use existing main stream search engine technology, to help enterprise build their own efficient search engine easily, and letting users can quickly obtain the required business information that providing enterprises decision-making basis, are becoming the hot research topic.In terms of the enterprise internal information retrieval requirements, this topic will analyze the necessity and feasibility of enterprise search engine, finally, confirmed Lucene structure as the core. Take full advantage of characteristics which is open source lightweight and efficient. Besides, integrating text extraction and database technology, building search engine which meets the needs of enterprise internal retrieval.The current vertical search engine based on Lucene has become a hot research topic on enterprise information search engine and data mining. In this paper, the research work mainly as follows:1. Analyzed the status and trend of the enterprise search engine. Introduced the basic parts of a search engine and expounds its working principle, summarized the full text search engine related technologies, including the Lucene frame, index technology and search mechanism, etc. Besides compared Lucene index and database index.2. Aimed at the problem of the Lucene index files occupy a large amount of storage space, further studied index file compression algorithm, and did some research on PFor Delta algorithm improvement.3. For common format document type,this paper processed unstructured text files, such as Word document, PDF files, Excel file, converted these files into Lucene search engine index format, to make system support all kinds of text information retrieval.4. Every function module in the retrieval system has being well analyzed and designed,finally, using Java program language implements the enterprise search engine which based on Lucene.After testing, this search engine can meet daily unstructured data search requirements,which means it has certain feasibility and practicability.
Keywords/Search Tags:Search Engine, Lucene, Chinese Word Segmentation, Index, Relevancy
PDF Full Text Request
Related items