Font Size: a A A

Design And Implementation Of Data Management System On HBase

Posted on:2017-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:2348330503489877Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, more and more data are generated by the application. The distributed database HBase has been widely used to manage the massive data. Many companies want to migrate data originally stored in the relational database to the distributed database HBase and manage the data on the HBase. So the study provides a data management system based on Hbase has important significance.After analyzing the design goals of the data management system based on HBase, the overall design of the system is given. The system includes two functions: transferring schema and data of the relational database into HBase and managing data in the HBase by SQL statements. In the schema and data transfer module, the column information, the index information and the primary foreign key information in the relation database are stored in HBase metadata table. Table data migration task is divided into small tasks and distributed equally among the cluster to complete data transfer task. At the same time, redundancy data is generated by the primary foreign key information and the index table is created in the HBase by index information. In the data management module by SQL statements, multi-table join query is mainly optimized. Depending on the features of HBase, the multi-table join query task is divided into several sub-multi-table join query tasks. The HBase coprocessor concurrently executes sub-multi-table join query tasks. The sub-multi-table join query efficiency is improved by optimize table join sequence. The efficiency of the sub-multi-table join query is improved by the redundant data and the index data which are produced during the data transfer tasks. The intermediate data during the sub-multi-table join query is stored by hash table and multi-tree to reduce memory consumption. Finally, all of the results from the sub-multi-table join query are merged on the client.Based on data management system, a series of experiments had been tested. The results show that the system can efficiently transfer table schema and data. After transferring the data can be correctly managed, and the multi-table join query has better performance compared to Hive.
Keywords/Search Tags:Schema and Data Migrate, Multi-table Join Query, Concurrent, Redundant Data, Index
PDF Full Text Request
Related items