Design And Implementation Of Data Management System On HBase

Posted on:2017-06-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y Hu

Full Text:PDF

GTID:2348330503489877

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, more and more data are generated by the application. The distributed database HBase has been widely used to manage the massive data. Many companies want to migrate data originally stored in the relational database to the distributed database HBase and manage the data on the HBase. So the study provides a data management system based on Hbase has important significance.After analyzing the design goals of the data management system based on HBase, the overall design of the system is given. The system includes two functions: transferring schema and data of the relational database into HBase and managing data in the HBase by SQL statements. In the schema and data transfer module, the column information, the index information and the primary foreign key information in the relation database are stored in HBase metadata table. Table data migration task is divided into small tasks and distributed equally among the cluster to complete data transfer task. At the same time, redundancy data is generated by the primary foreign key information and the index table is created in the HBase by index information. In the data management module by SQL statements, multi-table join query is mainly optimized. Depending on the features of HBase, the multi-table join query task is divided into several sub-multi-table join query tasks. The HBase coprocessor concurrently executes sub-multi-table join query tasks. The sub-multi-table join query efficiency is improved by optimize table join sequence. The efficiency of the sub-multi-table join query is improved by the redundant data and the index data which are produced during the data transfer tasks. The intermediate data during the sub-multi-table join query is stored by hash table and multi-tree to reduce memory consumption. Finally, all of the results from the sub-multi-table join query are merged on the client.Based on data management system, a series of experiments had been tested. The results show that the system can efficiently transfer table schema and data. After transferring the data can be correctly managed, and the multi-table join query has better performance compared to Hive.

Keywords/Search Tags:

Schema and Data Migrate, Multi-table Join Query, Concurrent, Redundant Data, Index

PDF Full Text Request

Related items

1	Oriented Data Warehouse, Multi-table Joins And Aggregation Algorithm Research
2	Multi-Join Query Algorithm Research Over Data Streams
3	Research On Query Optimization Of Distributed Database Middleware Mycat
4	Study On Multi-Tenant Data Storage And Data Migration On Basic-Table Combined With Extension-Table Schema
5	Research On Multi-Tenant Data Storage Mechanism Based On Universal Table In SaaS
6	Research Of Query Optimization Based On Join Index
7	Research On Key Techniques Of High Performance Spatial Query Processing For Large Scale Spatial Data
8	Lav In Data Integration System Query Processing
9	On Optimization Of Join Query Algorithms For Massive Data
10	Hadoop-based Geospatial Data Storage And Query Technology