Font Size: a A A

Research And Application Of Full Text Retrieval Technology Based On Lucene

Posted on:2018-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y DongFull Text:PDF
GTID:2348330533966293Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Full-text search technology is a combination of search engine and a number of Internet technology products, is one of the core technology of contemporary search engines. It changes the traditional retrieval methods that people match and retrieve by extracting the transaction characteristics, and makes the retrieval more user-friendly through the natural language organization, which is easier to meet the search requirements of the users and provide the users with more accurate information.This paper focuses on the research and development of full-text retrieval technology and its application in water conservancy technology standard information system. This paper studies the key technologies of full-text search, including Chinese and English word segmentation technology, text extraction, word segmentation optimization, concurrency control, weight optimization, index compression principle, and index security in concurrent environment. The relationship between relational database and existing full-text database is analyzed and compared. This paper analyzes the basic principles, architecture, core components, workflow,index mechanism and relevance ranking of lucene, and analyzes and compares the common Chinese word segmentation algorithm with Lucene extensible Chinese parser. A full-text search system based on file system is proposed, which can realize near-real-time search method.On the basis of theoretical and technical research, this paper has carried on the secondary development of Shaanxi province water conservancy technology standard information system,and uses Lucene full text retrieval tool kit and text extraction tool Tika to develop a search engine system based on full text search. The main modules of the system include: information extraction (including from the PDF file extraction, extracted from the database and extracted from other data sources), word segmentation (including filter word segmentation and Chinese and English word breaker), index (including index management, index optimization and index security ), Retrieval (including query analysis, weighted processing, highlight processing),system management (including document management,user management and rights management).The system system has established the full index index library of water conservancy technical standard information, realized the function of index structure optimization and hit result processing. Users can easily retrieve all the technical documents of water conservancy technology stored in the system. In addition, but also to achieve a large data environment lucene and NoSQL database hbase combination, so that the system more and more technical documents, can guarantee good scalability and search efficiency.The development of the system provides the convenient and powerful water resources technical search function for the water conservancy workers in our province, and enhances the practical value of Shaanxi Province water conservancy technology standard information system.In order to promote the modernization of water conservancy, the informationization has played a positive promotion effect.
Keywords/Search Tags:full-text search, Lucene, water conservancy technology standard information system, information extraction, word segmentation, index
PDF Full Text Request
Related items