Font Size: a A A

Research On Optimization Of Multi-dimensional Index Query Mechanism Based On HBase

Posted on:2020-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y L TanFull Text:PDF
GTID:2428330575453100Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The widespread use of mobile devices and the real-time availability of user location information is driving the development of new personalized,location-based applications and services(LBSs).Such applications need to be able to implement multi-attribute queries,real-time queries,big data analysis,and scalability to support millions of users.The new generation of distributed databases that extract values from large amounts of data,while being highly available,fault tolerant,and scalable,provide a much needed infrastructure to support LBSs.However,it cannot efficiently handle complex queries on dimensional data because they do not provide methods to access multiple attributes.Therefore,in order to realize multi-dimensional query and improve real-time query efficiency,this paper studies the multi-dimensional index mechanism and optimization strategy based on HBase.The main work is as follows:(1)The New-grid scheme,a unified index and data distribution framewor k based on HBase,is proposed,which uses key value storage to support mult i-dimensional query.First,the P-grid was improved by organizing a group of nodes in the overlay network to provide effective data distribution,fault tolera nce and multidimensional data query processing.Secondly,in order to build i ndexes,the linearization technology based on Hilbert space filling curve is us ed,which preserves the locality of data and effectively manages the multi-dim ensional data in the key value store.Finally,the algorithms for dynamically p rocessing range queries and k-nearest neighbor queries are optimized,which el iminates the maintenance overhead of separate index tables.This approach is completely independent of the underlying storage layer and can be implemente d on any cloud infrastructure.(2)An automatic configuration parameter tuning scheme based on HBase is proposed.HBase has many configuration parameters that affect system performance.These parameters influence each other in a complex way,making it extremely difficult to manually adjust them to obtain the best performance.The key problem in the optimization of the underlying configuration parameters is to establish an accurate and low-cost performance model with the configuration parameters as input.Through analysis and research,the new scheme uses the random forest algorithm to build the performance model,and combines the genetic algorithm and the performance model to search the optimal configuration parameters for the HBase application system so as to improve the performance of HBase.(3)The Hadoop experimental platform was built to verify the effectiveness and efficiency of the proposed multi-dimensional query scheme and parameter tuning scheme.Experimental results show that New-grid scheme can effectively improve the efficiency of multi-dimensional data query based on Hbase,and parameter tuning scheme can improve the performance of Hbase.
Keywords/Search Tags:HBase, multi-dimensional index, space filling curve, coverage network parameter tuning
PDF Full Text Request
Related items