Font Size: a A A

Research On Data Compression Technology Based On HBase

Posted on:2017-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:C H FuFull Text:PDF
GTID:2308330488997100Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the technology of big data and the rapid popularization and promotion of Hadoop platform, in daily life the amount of data shows the tendency of explosive growth, many types of data are becoming complicated and storage ways of data are various. Traditional way of large data storage is not able to lower the cost of large data stored. At the same time, because of the different data access frequency, the way of data storage of different access level are different. In view of the above situation, this thesis based on the compression of HBase under the environment of mass data storage technology are studied, the main innovation points are as follows:First of all, the thesis puts forward a data classification method based on the access frequency: according to the number of visits the database file over a period of time to get the corresponding access frequency, according to the access frequency of each data file and related threshold could be divided into hot and cold data file data and determine the specific level of access. On this basis, put forward based on the data access level compression strategy selection method: defines the data samples to determine the sampling method, in view of the original compression strategy choice method of defects of the prior knowledge is not necessarily reliable by adding evaluation layer adjust prior knowledge, and based on the reference region and the adjacent column selection method based on the statistics on the basis of design the HBase data compression strategy selection methods, optimize the storage costs. Simulation experiments and the results show that the proposed method not only can effectively realize large data storage, but also improve the performance of data access.Secondly, from the perspective of data migration, the thesis puts forward a method based on the value of the file data migration. First of all, based on factors such as data access frequency calculated the value of a block of data files, and by the value of the file to get the purpose of data migration. At the same time improve the data migration technology, using the data buffer and double buffer queue to solve the data into emigration rate mismatch problem, improve the efficiency of data migration, saving memory and time consumption, finally achieved the big data platform data storage optimization.Finally, based on the above theory and method, this thesis built a prototype system based on data compression storage and an e-commerce application demonstration. The realization of the system follows the requirements analysis, general design, detailed design and its implementation process, such as complete compression storage management, data migration, such as function modules, the feasibility of the proposed algorithm is verified, the results showed the compression technology based on HBase theory in dynamic situations of application effect.
Keywords/Search Tags:cold and hot data, data access level, HBase, data compression, data migration
PDF Full Text Request
Related items