Font Size: a A A

Research And Design Of Distributed Caching Policy On HBase

Posted on:2018-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z B YuFull Text:PDF
GTID:2348330512480185Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The advancement of the Internet avails Big Data of unprecedented attention.Being the infrastructure of Big Data research and application,Big Data storage system is undisputedly vital,with HBase being one the typical NoSQL.However,HBase at current stage is found to have flaws like unbalanced partition and single cache replacement policy-a restriction to reading and writing performance of cluster.This paper focuses on studying on these aspects,in order to optimize HBase reading and writing performances.The work of the dissertation is partly supported by the National Natural Science Foundation of China(No.61172072,61271308),Beijing Natural Science Foundation(No.4112045),and Research Fund for the Doctoral Program of Higher Education of China(No.20100009110002).The main contents of this paper are as follows.(1)Write cache:in the case of non-partitioned,the existing HBase is difficult to give a full play to distributed systems.Even if in the case of the pre-split,there is no universal pre-split method applying to all data tables and no strategy adaptively adjusting system load.In order to solve problems above,this paper designs a two-stage partition method.At pre-split stage,the RowKey is redesigned by virtue of the hash effect of MD5.At adaptive partition stage,a performance evaluation method on RegionServer is designed.It combines Analytic Hierarchy Process(AHP)and TOPSIS,employs and enhances the consistent hashing algorithm that is implemented by a new data structure designed in this performance evaluation.(2)Read cache:LRU cache replacement policy about existing BlockCache is far from competent.Although the cache is divided into multiple layers,all layers adopt the identical cache policy,which is to replace cache based on the last updated time of the data.This paper will make an improved design on the cache replacement strategy of each layer:hot data is taken into consideration on the Single layer,and so is the size of the Block on the Multi layer.In addition,threshold parameters about the Single layer going into the Multi layer are redefined to reduce the probability of FULL GC.Furthermore,the second level cache is designed according to the idea of Community Discovery to tackle low searching speed when it comes to closely related data such as continuous data.(3)Continuous data,random data and centralized data have been prepared to simulate different experimental scenarios,the HBase systems in this paper are applied in homogeneous and heterogeneous clusters to test reading and writing performance.The experimental results are compared and analyzed with the original HBase.Experiments show that reading and writing performance of the original HBase has been improved by the presented scheme.With good applicability and stability,the new HBase is suitable for the most types of data tables.
Keywords/Search Tags:HBase, Partition, Reading and Writing Performance, Consistent Hashing Algorithm, Cache Replacement Policy
PDF Full Text Request
Related items