Font Size: a A A

Research On Storage And Retrieval Of Multidimensional Time Series Based On HBase

Posted on:2021-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiuFull Text:PDF
GTID:2392330602480526Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
With the rapid development of Sci.& Tech.and economy,both the speed of data transmission and the amount of data stored are growing significantly.Because the traditional relational database has certain requirements on the integrity and security of the data,it has certain influence on the availability and scalability of the system,and causes the system's data operation speed to slow down,and the data management becomes difficult.These effects cannot be ignored in some functional modules or systems that require real-time performance.At the same time,the dimension of the data that needs to be managed has increased over the years..For example,with the rapid development of civil aviation in recent years,there are hundreds of sensors collecting data at any moment of a flight.In the foreseeable future,the civil aviation industry needs to expand more data acquisition dimensions for flight efficiency and flight safety.This makes the application of non-relational databases in civil aviation systems more and more common.Non-relational database HBase is a distributed storage system oriented to column storage.Compared with the traditional relational database,HBase has the adventages of convenient dimension expansion and high concurrent read and write operations.In this paper,HBase storage system is selected as the research object.According to the characteristics of multidimensional time series data,the performance of HBase is improved by finding a set of optimal configuration parameters.The research contents of this paper mainly include the following aspects:(1)Screening of HBase configuration parameters and generation of training samples.Screen effective configuration parameters and adopt a random strategy to generate a series of configuration files.Use the YCSB to collect performance data corresponding to each configuration file.(2)HBase performance prediction model construction and optimization.Random forest algorithm and XGBoost are used to build four prediction models of throughput and average delay,respectively,and the model parameters are first optimized by Bayesian optimization algorithm.Then,the throughput and the average delay model are weighted to obtain two combined models,and the errors of the two combined models are compared to select the optimal model.(3)Optimization of HBase configuration parameters.Through the improved genetic algorithm,the performance model of HBase is optimized to obtain a set of optimal configuration parameters corresponding to the optimal solution of the model.(4)Performance comparison of configuration parameters before and after optimization.By testing the effect of configuration parameters before and after optimization on HBase performance in a real environment,the effectiveness of performance optimization based on HBase configuration parameters is demonstrated.
Keywords/Search Tags:HBase, Configuration Parameter Optimization, Random Forests, XGBoost, Genetic Algorithm, Bayesian Optimization
PDF Full Text Request
Related items