Research On Data Processing Technology Based On HBase

Posted on:2020-01-03

Degree:Master

Type:Thesis

Country:China

Candidate:J C Sun

Full Text:PDF

GTID:2428330596468997

Subject:Public Security Technology

Abstract/Summary:

PDF Full Text Request

HBase is an important type of column-oriented database which can suit well to the requirements of large-scale distributed storage.Data processing technology based on HBase has always been a research hotspot.Data compression and data retrieval are the key technologies of data processing.Data compression technology can save storage space,reduce data I/O and improve data processing speed.With the increasing speed of data generation,higher requirements have been put forward on data compression technology.At the same time,although HBase has a strong storage advantage,the support for data retrieval is poor due to its inherent defects,which limits its application scenarios.Considering the problems above,HBase data compression and retrieval technology has been studied in this thesis.The specific works are listed as follows:In order to solve the problem of high learning cost and low compression efficiency,a sorted-based hybrid compression strategy of column-based compression and sector-based compression was proposed.Firstly,a method to sort the data in each column was designed according to the characteristics of HBase to strengthen the data compaction.Secondly,the compression algorithms suitable for different data were selected through research,and the XGBoost algorithm with excellent generalization characteristics and parallel computing support was introduced as the classification algorithm of compression strategy.Finally,according to the characteristics of the data,the proposed hybrid column-based compression strategy and hybrid sector-based compression strategy were applied respectively to recommend the compression algorithm.Experiments have been conducted on TPC-DS standard data and the results demonstrated that the proposed strategy had better performance in terms of compression ratio and compression/decompression time.Aiming at making up for deficiencies of full-text search performance of HBase,a strategy of joint full-text retrieval based on HBase was proposed.The strategy involved three aspects of methods including data storage,data indexing and data retrieval.Firstly,a data storage method was designed to quickly import and classify the data according to different retrieval requirements.Secondly,the data indexing method was designed to generate the inverted index through text analyzer and store it in the index table.Finally,a data retrieval method was designed.The full-text search request was firstly queried by ElasticSearch,and then the queried record ID of the match results were returned to HBase to obtain other attribute values corresponding to the row key.The key issues affecting retrieval performance such as HBase table structure design,text analyzer construction and return volume of data were discussed.The proposed strategy was verified by experiments in respects of temporal/spatial cost and query performance.The experimental results showed that the joint retrieval strategy can greatly improve the query efficiency in full-text search area under the condition of occupying small temporal/spatial cost.

Keywords/Search Tags:

Column-oriented storage, HBase, Data compression, Data retrieval

PDF Full Text Request

Related items

1	A Research Of Spatio-Temporal Object Query Processing Technology Oriented To Column Storage Model
2	Research And Implementation Of Data Compression Based On Column-Oriented Database System
3	Research And Implementation Of Large Collections Of RDF Data Storage And Retrieval Technology On HBase
4	Study On The Analysis And Optimization Of Column Storage Performance Based On Hive On Spark
5	Design And Implementation Of Data Dictionaries In Column Storage DWMS
6	Research On Data Compression Technology Based On HBase
7	Research Of Compression Algorithm For Sparse Data In Column-oriented Database
8	Research On Key Technologies Of Column-Oriented Database For Big Data
9	Research And Implementation Of Compression Technology In Column-Oriented Data Warehouse
10	Research And Optimization Of Multidimensional Data Warehouse Model Based On Column Storage