Font Size: a A A

Study And Implementation On Full-text Search Engine Based On LUCENE Under The Large Amount Of Data

Posted on:2016-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:J G GaoFull Text:PDF
GTID:2308330479484743Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The era of big data, search engines have become an important tool for people to obtain information and information management, Lucene is one of the most popular open source search tool currently, it has been applied to many fields. But as the growth of the amount of data the Lucene will encounter many problems, Indexing large amounts of data will generate a large index file,a big index file will expose many problems in the process of index creation,index merger and search operations.And it also affect the real-time search.The developers are plagued by a series problems caused by the Increased amount of data,Lucene is only a search engine toolkit, we need to do more work in the practical application,Especially in the case of the data volume is larger.This article is around these problems, and puts forward an effective solution to solve the problem above.Through continuous learning and practice, We summarized the main factors affecting the large amount of data indexing and search performancewe is that The index files and index fragments exist together cause incremental index, index merger and search loading costs too much. When we index or merge lots of data,it will encountered the problems of disk IO bottlenecks and excessive consumption of memory.For the above questions, we mainly from three aspects to solve the problem of great amount of data index and search:First,for a single index case, we use the methods of cache,memory index catalog and reuse Index Writer object and Index Search object to reduce the frequency of disk IO and improve loading speed of search.Second,We put forward a kind of main index local optimization combination of multiple file storage strategy to solve the problem of high cost of incremental indexing and index merger because of Big index files and index fragments coexist. Third,Index incremental index updates will be handled separately and sub index files are stored internally optimized to reduce the frequency of the primary index merge optimization and support for real-time search.Considering the practical application of the real-time demand and traditional real-time search solution problems under the condition of large amount of data, we put forward a new kind of real-time search solution. Finally, we use the Index and search optimization solution and Real-time search solution implements a full text information retrieval system.
Keywords/Search Tags:large amount of data, Lucene, index and search optimization, real-time search, full text retrieval system
PDF Full Text Request
Related items