Font Size: a A A

Research And Implementation Of Time Series Database Optimization Based On InfluxDB

Posted on:2022-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:X X ZhuFull Text:PDF
GTID:2518306764466674Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet of things(IOT),Internet enterprises and data centers are increasingly concerned about how to more effectively collect,store and analyze the log information of IOT devices,so as to provide better support for the upper business.In order to solve the storage and analysis of device log information in IOT application scenario,time series database came into being.The complexity and variability of IOT application scenarios determine that time series database has broad research prospects.Therefore,the research on the optimization of time series database is of great significance.Influxdb is an open source time series database developed by influxdata,which can be used as the main platform for data storage,query and analysis of various log analysis systems.Based on the LSM-Tree storage architecture,InfluxDB implements a TSM(time structured merge tree)storage engine that supports temporal data storage.Therefore,the InfluxDB storage engine inevitably has some performance problems of LSM-Tree storage architecture,such as write amplification,write stall and read amplification,which has a certain negative impact on the overall performance of the database.In addition,with the vigorous development of cloud native applications in recent years,Internet enterprises have an increasing demand for database migration to the cloud.The clustered deployment of database is a maj or challenge for R?D personnel.However,the native cluster scheme of InfluxDB is relatively single and lacks certain expansion ability,which can not well meet the needs of users for high reliability and high availability of data storage.Firstly,this paper studies the write performance of the TSM storage engine of InfluxDB.In this paper,it is considered that InfluxDB does not make full use of the overall feature of monotonous increase of timestamp of time series data to optimize the storage engine.Therefore,in terms of write performance,it is subject to the write stall and write amplification of LSM-Tree.In this regard,this paper optimizes the design and implementation of InfluxDB with the goal of eliminating the write stall of TSM storage engine,reducing the write amplification and the resource overhead in the long-term continuous write scenario.Secondly,based on the transformation of the TSM storage engine of InfluxDB,this paper redesigns and implements the "computation-storage" separated InfluxDB cluster architecture.One is to expand the cluster of data nodes that originally only guarantee the final consistency into multiple piecewise strong consistent replica sets in the computing layer.The other is to alleviate the storage pressure of the computing layer through several storage nodes in the storage layer and expand the storage capacity of the system at the same time.The experimental results show that the transformation of incluxdb storage engine in this paper achieves the design goal,the write performance has been improved to a certain extent,and the query performance has not been significantly weakened.In addition,the"computation-storage" separated InfluxDB cluster architecture meets the design requirements in terms of consistency and fault tolerance,and has good performance.
Keywords/Search Tags:time-series database, TSM storage engine, cluster architecture, "computation-storage" separation
PDF Full Text Request
Related items