Font Size: a A A

Research On File Structure Design And Optimization Of Time Series Database Management System For Internet Of Things

Posted on:2022-09-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L QiaoFull Text:PDF
GTID:1488306746956879Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Time series data in the Internet of Things(Io T)is time-stamped data collected by the machines and equipment in Io T applications.It is the main part of the scale and value of industrial big data.Efficiently managing these data is an important foundation for supporting our country's industrial internet strategy.With the increasing popularity of Io T devices and applications,the volume of time series data increases sharply,which brings new challenges to the storage and query performance of existing database management systems: The existing file format cannot support massive time series management,high compression ratio storage and efficient raw data and aggregation query at the same time.To this end,the main work of this article includes:· To efficiently organize massive time series data in Io T,this article researches the file format in database management systems.A columnar file format Time series File(TsFile)is proposed for time-series data,which contains data and index area.In the data area,device spaces are created for different device instances,and column layout is used to efficiently compress and store each time series data of the device.In the index area,TsFile maintains a sparse index of the time dimension and a tree-structured dense index of the series metadata dimension.TsFile supports fast reading and writing and high compression storage of massive time series data.· To solve the difficulty that the fixed data area in TsFile is unable to deal with the diverse raw data query in time and series dimensions,this article researches the workload-aware file format optimization technology.A query cost estimation model on TsFile is established,and a workload-aware time series data file format optimization strategy is proposed for both single and multiple replicas.Heuristic search is used to find the approximate optimal file format.Under complex query workload,multiple files can adaptively form heterogeneous formats,which greatly improve the system query throughput.· Aiming at the problem that the TsFile index area can not support the efficient aggregation queries of long time series in any time window,the index structure is researched.A persistent aggregation index in forest structure is proposed,which uses the characteristic that time series arrive in chronological order.It could be incrementally constructed without rotation,rebalancing or other operations.At the same time,it could be built efficiently under memory-limited conditions.Furthermore,a fast index query algorithm is proposed by avoiding reading unnecessary index nodes on disk,which can process aggregation queries of tens of billions of data points within a few hundred milliseconds.· Based on TsFile,the storage engine of the Io T time series database management system Apache Io TDB is designed and developed,enabling efficiently managing the high write throughput of the massive time series data.The query interface on multiple TsFiles is designed to support typical queries in Io T.Apache Io TDB can manage tens of millions of time series,and is superior to the existing time series database management systems in terms of writing throughput and query latency.Io TDB has been successfully applied to many industries such as rail transit,new energy,and smart manufacturing.
Keywords/Search Tags:Io T time series, file format, aggregation index, storage engine, TsFile
PDF Full Text Request
Related items