Font Size: a A A

Research On The Full-text Indexing And Retrieval Technology Based On HBase

Posted on:2016-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:G Q WuFull Text:PDF
GTID:2348330479954707Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of Bigdata, the full-text search technology for Bigdata has become a hot research. HBase is a Nosql database that supports massive unstructured data storage and real-time read-write. But it is only supported the primary key query, and does not support full-text retrieval. How to use HBase to solve the full-text indexing and incremental index creation and storage, to achieve the efficiency of retrieval is the focus of this paper.Through the research of the present information retrieval technology and HBase retrieval status, proposes a scheme of full-text retrieval based on HBase. Using the MapReduce distributed model to create full-text index and put the index into HBase, then proposing a retrieval scheme based on HBase Index table, finally using the Coprocessor to ensure the incremental data updating.According to the performance of full-text retrieval, further research is on Hadoop optimization, index storage and retrieval strategy optimization. Analyzing the bottleneck of Hadoop task performance, improved the efficiency of task execution; using compression for the storage of index, effectively reduced the index file size; filtering out lower weight document, improved the efficiency of retrieval.Experiments show that this scheme can effectively create and store index, can retrieve data quickly and support incremental data updating. Further research is considered to improve the accuracy of the retrieval.
Keywords/Search Tags:full-text retrieval, performance optimization, MapReduce, HBase
PDF Full Text Request
Related items