Font Size: a A A

Nested Data Storage System Design And Implementation Based On HBase

Posted on:2016-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:H T MaFull Text:PDF
GTID:2308330470463065Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the advent of the Internet epoch, the amount of data is growing rapidly. How to store and utilize the big data effectively is particularly important. For meeting the needs of big data format and the ability to expand, the industries propose NoSQL database as a solution. NoSQL meets the needs of growing demand for storage capacity with distributed storage framework and "schemaless" format. It provides convenience for users to change their business. Because of adjustments of NoSQL database design in these areas, it attenuates the relevance of data during the storage procedures. What’s more, it makes the speed of reading data slower and quering data more complex when use NoSQL database to do big data analysis.To fix the deficiencies of the NoSQL on data analysis, this thesis presents a nested data storage system based on HBase. It combines with the nested data storage format which comes from "Dremel" to solve the problem of big data encountered in the storage and analysis process. The main work of this thesis is as follows:1) The nested data storage system based on HBase use the original HBase distributed storage architecture. It inherits the scalability and high availability of HBase. It uses HMaster to manage and operate the data storage systems, and HRegionServer to manage the data storage on each child node.2) The nested data storage system based on HBase does format conversion for column-based storage structure. It adds the data conversion module into HRegion class to convert the HBase original column-based storage structure to nested data storage structure. The nested data storage system use Parquet, an implement of Dremel, to do the persistence work for big data.3) The nested data storage system based on HBase realizes the store and read module. It implements the reading and writing function and reinforces the search function in read module.4) This thesis verifies the performance of the nested data storage system on data analysis. By using MapReduce computing framework to analyze big data, the results show that the performance of the HBase nested storage system query by column is better than the original HBase storage system at about one-third. When the number of the columns in storing table increase, the consuming time of the nested storage system based on HBase grows slower.The nested data storage system based on HBase can meet the requirements of read and write big data performance. It reduced the overhead for reading unnecessary data on the big data analysis and abated the cost of disk and CPU, accelerated the analysis speed of big data.
Keywords/Search Tags:Big Data Storage, HBase, Nested Data Storage System, Dremel
PDF Full Text Request
Related items