Font Size: a A A

Fast Indexing And Visualization Of Time Series Data

Posted on:2022-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2518306572997789Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet of Things technology,the amount of time series data has seen an explosive increase.Stock market forecasts,weather forecasts,financial analysis,population forecasts,etc.,are all inseparable from the support of massive time series data.Indexing and visualization have become a prerequisite for analyzing these time series data,but time series data is generated in real time,and the amount of data can easily reach millions,tens of millions,or even hundreds of millions.How to complete the storage,indexing and visualization of such amount of time series data Full of challenges.In order to effectively use the data storage space,time series data storage adopts a distributed compression storage structure,and uses different compression algorithms for different types of data such as integers,floating-point numbers,Boolean values,character strings,and time.Each storage node uses the log structured merge tree data structure to organize data,and the stand-alone query performance is good.The multi-machine query adopts the secondary index scheme to separately establish an inverted index and a prefix tree structure for the data.When a joint query of multiple databases is generated,the data is first filtered through the secondary index to reduce the pressure on the database.The result data of the query cannot be directly visualized due to the large amount of data and large dimensions.Therefore,the data of the same dimension is approximated by segmented aggregation;if the query result contains data of multiple dimensions,the dimensionality reduction algorithm is used to reduce the dimensionality.For visualization,a random nearest neighbor embedding dimensionality reduction algorithm is used here.In order to avoid the crowding problem caused by too many dimensions,an optimized random nearest neighbor embedding algorithm based on t distribution is finally used.Based on Influx DB,a prototype system for storage,indexing and visualization of time series data was constructed,and tests were conducted based on the hard disk monitoring data of the data center.The test results show that the system obtains query results in seconds for hundreds of millions of time series data queries,and the system has good query performance.
Keywords/Search Tags:time series data, index, dimensionality reduction visualization, Stochastic neighbour Embedding
PDF Full Text Request
Related items