Font Size: a A A

Design And Implementation Of Mass Data Storage Solution Based On HBase

Posted on:2016-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:C Y MaFull Text:PDF
GTID:2308330461984157Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and information network technology, the era of big data is coming. As a large amount of data has been created everyday, personal data producted by the Internet is expanding sharply. From traditional text data to document, audio, video and images, this transition from structured data to unstructured data has raised new demands to the storage of personal data and Internet data storage management.Traditional relational database only provides support to the structured data storage and management, which means the mass of unstructured data is difficult to deal with in this kind of database. Therefore, the emerging non-relational database technology is bringing an opportunity for the massive unstructured data storage.This thesis presents a solution to the storage of massive data according to the need of thin client users. A distributed storage solution based on HBase is designed and implemented which solves the problem of unified storage of various types of mass data.Firstly, network disk application is improved according to the demand of mass data storage, and the data migration from traditional databases to HBase database is realized. HBase database is used to assign a private storage space in the cloud for the thin client users, in which the users can upload and download various files. Meanwhile, the data is transferred to the cloud through network disk for unified storage and management.Secondly, the distributed storage of user data is implemented with HBase cluster which has several features including its column storage and scalability to build efficient storage clusters in low hardware environment. Efficient storage of user data have been implemeted by mounting the network disk storage which is on the multimedia thin client to the HBase.Thirdly, several improvements have been made to the drawbacks exists in storaging massive data based on HBase cluster which includes the optimization in data insertion and reading mechanisms and separated storage of different size of the user data. During the storage of mass data, split threshold of its region would be reached in no time, which would lead to frequent splitting and merging of the region, and this process would block users’ writing processes, and then the insertion performance will be infected. Efficient management of mass data is realized by storing mass data separately in separate columns, while improves the flush and compaction mechanisms in data storage. For the backup process of log in HBase, by introducing a remote log process, the data storage availability and durability is ensured, and the performance of the system time is improved.At the end of this thesis, an experiment is designed and implemented for large-scale data storage solution designed in this thesis, which proves that this solution is feasible, and the writing and reading time performance of the system is remarkably improved whether it is implementing to small or big amount of data.
Keywords/Search Tags:NOSQL, HBase, data storage
PDF Full Text Request
Related items