Font Size: a A A

Research And Development Of Big Data Storage Systems Based On Hbase

Posted on:2018-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2348330533466284Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the arrival of big data age, the amount of data which is stored in the information system database is exploding. And the reading and writing data and the query of performance are getting higher and higher. The traditional relational database can not satisfy the big data storage and the query requirement. In order to explore the storage and the query technology of massive data, we do research on the development of the typical non-relational (NoSQL)database Hbase.Hbase is an open source version of Google BigTable, which features high reliability, high performance, column oriented, scalability,consistency, and so on. It supports two levels of indexing function. Using Hbase technology, we can build large-scale storage clusters on the cheap PC Server, and achieve large data storage system.Firstly, we do research on the architecture of big data storage system of this topic, the key technology of NoSQL database and the key technology of Hbase database. Secondly,we deploy the Hbase database system to store the floating population database in the Spark large data platform. As the Hbase database only supports primary key-based queries, we add a secondary indexing function to the floating population database, which greatly improves the query speed.We analyze and evaluate the floating population database based on this. And we test the performance of Hbase by using the test tool YCSB developed by Yahoo Corp. The test object is a Hbase data table based on the actual data provided by an enterprise,and the total number of entries is 30 million.Finally, we develop a prototype system based on the Spark big data platform and Hbase database system and massive data management for the floating population. The system consists of data acquisition,data storage, data management of the floating population,statistical analysis,system management, and other modules.The storage of data records up to 30 million, the total amount of data can reach 12.6GB. And the system realizes the etticient storage and fast query of the mass data of the floating population.
Keywords/Search Tags:NoSQL database, Hbase, secondary index, Spark
PDF Full Text Request
Related items