Study And Implementation On Full-text Search Engine Based On LUCENE Under The Large Amount Of Data

Posted on:2016-10-12

Degree:Master

Type:Thesis

Country:China

Candidate:J G Gao

Full Text:PDF

GTID:2308330479484743

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The era of big data, search engines have become an important tool for people to obtain information and information management, Lucene is one of the most popular open source search tool currently, it has been applied to many fields. But as the growth of the amount of data the Lucene will encounter many problems, Indexing large amounts of data will generate a large index file,a big index file will expose many problems in the process of index creation,index merger and search operations.And it also affect the real-time search.The developers are plagued by a series problems caused by the Increased amount of data,Lucene is only a search engine toolkit, we need to do more work in the practical application,Especially in the case of the data volume is larger.This article is around these problems, and puts forward an effective solution to solve the problem above.Through continuous learning and practice, We summarized the main factors affecting the large amount of data indexing and search performancewe is that The index files and index fragments exist together cause incremental index, index merger and search loading costs too much. When we index or merge lots of data,it will encountered the problems of disk IO bottlenecks and excessive consumption of memory.For the above questions, we mainly from three aspects to solve the problem of great amount of data index and search:First,for a single index case, we use the methods of cache,memory index catalog and reuse Index Writer object and Index Search object to reduce the frequency of disk IO and improve loading speed of search.Second,We put forward a kind of main index local optimization combination of multiple file storage strategy to solve the problem of high cost of incremental indexing and index merger because of Big index files and index fragments coexist. Third,Index incremental index updates will be handled separately and sub index files are stored internally optimized to reduce the frequency of the primary index merge optimization and support for real-time search.Considering the practical application of the real-time demand and traditional real-time search solution problems under the condition of large amount of data, we put forward a new kind of real-time search solution. Finally, we use the Index and search optimization solution and Real-time search solution implements a full text information retrieval system.

Keywords/Search Tags:

large amount of data, Lucene, index and search optimization, real-time search, full text retrieval system

PDF Full Text Request

Related items

1	Based On The Distributed Real-time Solr Full-text Retrieval System Design And Implementation
2	Design And Improvement Of Website Full-text Retrieval System Based On Lucene
3	Research And Application Of Full-Text Retrieval System Based On Lucene
4	Based On Research And Optimization Lucene Inverted Index Performance
5	Research And Application Of Full-text Search Based On Lucene
6	Research And Implementation Of Enterprise Information Fulltext Search System Based On Lucene
7	The Research And Application Of Real-Time Search Engine For Large-Scale Enterprise System
8	Research And Application Of Full Text Retrieval Technology Based On Lucene
9	The Design And Implementation Of Full Text Search Engine Based On Lucene
10	Research And Application Of Full-text Retrieval Technology Based On Lucene