Fast Indexing And Visualization Of Time Series Data

Posted on:2022-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:P Zhang

Full Text:PDF

GTID:2518306572997789

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of the Internet of Things technology,the amount of time series data has seen an explosive increase.Stock market forecasts,weather forecasts,financial analysis,population forecasts,etc.,are all inseparable from the support of massive time series data.Indexing and visualization have become a prerequisite for analyzing these time series data,but time series data is generated in real time,and the amount of data can easily reach millions,tens of millions,or even hundreds of millions.How to complete the storage,indexing and visualization of such amount of time series data Full of challenges.In order to effectively use the data storage space,time series data storage adopts a distributed compression storage structure,and uses different compression algorithms for different types of data such as integers,floating-point numbers,Boolean values,character strings,and time.Each storage node uses the log structured merge tree data structure to organize data,and the stand-alone query performance is good.The multi-machine query adopts the secondary index scheme to separately establish an inverted index and a prefix tree structure for the data.When a joint query of multiple databases is generated,the data is first filtered through the secondary index to reduce the pressure on the database.The result data of the query cannot be directly visualized due to the large amount of data and large dimensions.Therefore,the data of the same dimension is approximated by segmented aggregation;if the query result contains data of multiple dimensions,the dimensionality reduction algorithm is used to reduce the dimensionality.For visualization,a random nearest neighbor embedding dimensionality reduction algorithm is used here.In order to avoid the crowding problem caused by too many dimensions,an optimized random nearest neighbor embedding algorithm based on t distribution is finally used.Based on Influx DB,a prototype system for storage,indexing and visualization of time series data was constructed,and tests were conducted based on the hard disk monitoring data of the data center.The test results show that the system obtains query results in seconds for hundreds of millions of time series data queries,and the system has good query performance.

Keywords/Search Tags:

time series data, index, dimensionality reduction visualization, Stochastic neighbour Embedding

PDF Full Text Request

Related items

1	Several Algorithms Based On Nonlinear Dimensionality Reduction Of Facial Expression Recognition Research
2	The Study Of Correlation Analysis And Dimensionality Reduction Methods And Their Applications
3	Research On Dimensionality Reduction And Prediction Methods In Time Series Data Ming
4	Variational Auto-Encoder Combined With T-Distributed Stochastic Neighbor Embedding For Dimensionality Reduction And Cluster Analysis
5	Research And Implementation Of Multidimensional Time-Series Data Mining Methods
6	Research Of Dimensionality Reduction And Similarity Matching For Uncertain Time Series
7	Research On Data Dimensionality Reduction Algorithms Based On Matrix Decomposition Learning
8	Research On Dimensionality Reduction Of High-dimensional Data
9	Nonlinear Dimensionality Reduction Based On Stochastic Initialization
10	Research And Application On The Dimensionality Reduction Algorithm With Neural Networks