Font Size: a A A

Research And Application Of Query Optimization Based On HBase

Posted on:2022-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z WangFull Text:PDF
GTID:2518306524488794Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the management and storage of massive amounts of data require higher requirements.As a distributed No SQL database under the Hadoop framework,HBase has been chosen by many companies as a big data storage database with its strong scalability,excellent storage capabilities,and good read and write capabilities.At present,HBase is commonly used in the Internet and Internet of Things,such as energy data storage,vehicle information collection,e-commerce order backup,industrial sensor data storage,etc.The data types are mainly stream data such as spatiotemporal and time series data.In the field of energy data,monitoring data of wind turbines in wind farms is standard time series data,which is one of the main applicable scenarios for HBase.In view of the fact that HBase does not provide non-primary key indexes,in the face of non-primary key column queries,only inefficient full table scans can be performed.Aggregate queries are faced with massive data requiring repeated calculations and low real-time performance.This article is dedicated to query optimization based on HBase The research and application of technology,the main work content is as follows:(1)In order to support multi-condition non-primary key queries with different environments and different performance requirements,a secondary index scheme suitable for different environments is designed for HBase,which realizes the automatic construction and update of indexes,and implements SQL statements to call HBase queries through the parser.Function of API interface.(2)Aiming at the common problem of index and metadata consistency in existing secondary indexing schemes,a solution to index consistency based on delayed update is proposed to achieve the final consistency of index and data and avoid index update errors.The problem of index failure can improve query efficiency under the premise of ensuring correctness.(3)In order to improve the efficiency of HBase for time series data aggregation and query,an index structure based on time segmentation tree is proposed.The query overhead of the tree index stored on the disk is large,and the query time is affected by the amount of data.The split tree structure is improved,and the query algorithm is optimized at the same time to avoid the disk I/O overhead of traversing the index tree layer by layer,and improve the query efficiency.For the above optimization scheme,an independent API interface is provided to realize the automatic construction and update of the index.According to the actual project requirements,a wind turbine monitoring and management system using HBase as the underlying database was designed,and each module was designed and applied in the system.At the same time,the wind turbine operating data was used as a data set to conduct a functional test and efficiency test on the optimization plan.
Keywords/Search Tags:HBase, secondary index, data consistency, aggregation query, time split tree
PDF Full Text Request
Related items