Font Size: a A A

The Design And Implementation Of High Efficient Data Access Platform Based On HBase

Posted on:2019-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2348330542998147Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Web 2.0 and wildspread of mobile internet,the data in the Internet has shown explosive growth trend.With the volume of data growing bigger,storage of massive data has became the first problem need to be solved in many scenario.Distributed database is more suitable for massive data storage scenario because of its excellent ability of scale-out when comparing with traditional relational database.HBase is a distributed,open-source,column-oriented database of Apache Software Foundation.HBase is suitable for massive data storage and reading writing in high performance.Howerer,HBase does have some shortcomings.The main weakness is the lack of secondary index.And it leads to the fact that query on non rowkey columns has to be done with Filter and whole table scan,which is low performance in big data scenario.Although there have been some research on the secondary index of HBase,they are incomplete in function or lack of ability dealing with special column value or incompatible with the newest version of HBase.Besides,there is only one HRegion when HBase creating table in default mode and it might lead to intensive reading and writing on single datanode,which may result in hotspotting in high concurrency requests.And based on these disadvantages,it's necessary to study deeply and implement the secondary index function and the strategy to prevent hotspotting problem.This thesis proposed the method to optimize the performance of HBase based on deep research of storage structure,system structure of HBase and secondary index of relational database.The main work of thesis are as follows:Firstly,this thesis proposed the grouping column index structure of HBase based on the study of storage structure of HBase,secondary index data structure of relational database and the leftmost prefix match principle of secondary index.The proposed index structure implemented secondary index function through grouping index column values to the rowkey of specific index table.And the proposed index structure supported both composite index and special cases such as the index column value is empty.Secondly,this thesis raised an algorithm to calculate the best index hit result in composite index and query with multi conditions scenario,which accelerated index pruning as well as reducing 10 cost of netwok.On the basis of query,the system also provided insertion,conditional deleting and updating operations.Thirdly,this thesis proposed rowkey type based pre-split strategy based on the storage structure of HBase.The strategy split the table evenly when creating table and distributed the HRegions in different HBase cluster datanode to avoid hotspotting.Finally,the thesis compared the system performance with native HBase and the result showed that the new system had better performance over query,conditional deleting,conditional updating operations with the secondary index function and avoided hotspotting with pre-split strategy.The both mechanism accelerated the reading and writing performance of HBase.
Keywords/Search Tags:HBase, secondary index, pre-split, B+tree, leftmost prefix match
PDF Full Text Request
Related items