Font Size: a A A

Optimization Design And Performance Analysis Of Big Data Storage Based On HBase

Posted on:2020-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WenFull Text:PDF
GTID:2428330575953251Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
HBase is an important tool for big data storage processing.However,HBase native language is Java,third-party languages are restricted in access.Therefore,solving HBase database in heterogeneous systems is one of the main problems solved by big data.In order to solve the problem of heterogeneous system access to HBase database,we study Thrift storage mechanism.Forthermore we analysis storage performance and improve data storage structure which improve storage efficiency.The storage performance is verified by using the space-on-orbit aircraft data as a storage object.The main contents of this paper are as follows:1.Storage mechanism analysis.HBase sorts the storage characteristics in lexicographic order which causes storage hotspot problems in the system,affecting storage performance and cluster load balancing.Due to the HBase brush write feature,as the amount of data increases,the Region write operation is blocked,and data is forced to be written to the database.The write efficiency is affected;the Thrift interface definition language stores the HBase database in rows.When the amount of data is large,the data request operation is frequently performed which increases the service call time and affect the system communication performance.2.Storage optimization design.In view of the above problems,we optimize the design of big data storage.Firstly,the rowkey value hash storage is designed to achieve load balancing and avoid hotspots.According to the impact of the write value on the write efficiency,the appropriate system brushing scheme is set.Finally,the Thrift IDL communication model is designed to redefine the data transmission structure.The multi-line data is bound together,and the multi-line data is stored in blocks by an RPC call;according to the new IDL model,the HBase Thrift server interface and the non-blocking implementation of the client are modified to improve the HBase storage performance.3.Storage performance analysis.The HBase storage performance was tested and analyzed with the space-on-orbit aircraft data as the storage object.The theoretical analysis and experimental results show that the hash design of rowkey can effectively solve the hotspot problem of HBase database system.The value of the bursh write becomes larger will better than the IDL before optimization,which improve the system storage performance.The optimized IDL model can effectively reduce the frequency of IDL sending data operation requests to the server.which improves the HBase storage efficiency by 4~5 times.
Keywords/Search Tags:HBase, Thrift, Remote Access, IDL, Big data
PDF Full Text Request
Related items