Font Size: a A A

Lossy Compression Algorithm Of Time Series Data For Time Series Database

Posted on:2020-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y GuFull Text:PDF
GTID:2428330599952924Subject:engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,countless devices need to collect data from different indicators,and the amount of data recorded is very large.If the collected data is directly stored in a traditional relational database,not only will it consume a large amount of storage space,but it will also reduce the efficiency of data transmission,query,analysis,and processing.The existing traditional database system is not efficient for storing time series data,so it is necessary to design a special time series database for the characteristics of time series data.The current time series database allows users to create,update,and destroy various data and graphically display and analyze it,but does not store large amounts of historical data to disk directly.This greatly restricts the development of the time series database,which easily causes problems such as excessive storage space consumption,excessive disk read and write,and system performance degradation.Therefore,the introduction of efficient data compression technology into the time series database is of particular importance for the development of high-performance time series databases.Based on the above problems,this paper proposes a new and efficient time-series data lossy compression algorithm for the characteristics of time series data.This algorithm can save the storage space for time series data and improve the data transmission speed by removing the redundant part of data and shortening the length of data coding under the premise of certain accuracy.The specific work is as follows:1)A time stamp compression algorithm based on the difference method is proposed.This paper optimizes the traditional time stamp compression algorithm based on the difference method.The algorithm calculates a quadratic difference value for each timestamp difference,and encodes the second difference value according to the set compression rule to implement compression of the timestamp.In addition,for the timepoint omission problem that is common in time-series data collection,the compression algorithm proposed in this paper can also reduce the storage overhead of timestamp.2)A time-series data lossy compression algorithm is proposed.In the data storage process,the floating point data uses the IEEE-defined coding standard.This will occur when the difference between the two floating point values is small and the binary obtained by the encoding conversion is far from the same.Aiming at this problem,this paper proposes a lossy compression algorithm based on the composition characteristics of floating-point numbers.First,vector quantization preprocessing is performed on the data.The raw data is then converted into binary coded bytes of similar structure within an acceptable loss of precision.Finally,an XOR operation is performed and the redundant portion of the result is encoded and compressed.The lossy compression algorithm can achieve efficient compression of data values in a time series.Finally,the proposed algorithm is verified by simulation experiments.The experimental results show that after selecting the appropriate loss factor,the lossy compression algorithm proposed in this paper can achieve a good balance between loss accuracy and compression ratio,and the compression rate of data values is as high as 5.274 times and more than 90% of the time.The timestamp can be stored with 1 bit and the average loss rate approaches zero.
Keywords/Search Tags:Time series database, time series data, data compression, lossy compression
PDF Full Text Request
Related items