Font Size: a A A

Research On Key Technologies Of Anomaly Detection Based On Time Series Data Mining

Posted on:2021-05-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:P ZhanFull Text:PDF
GTID:1360330632457840Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Time series is a kind of high-dimensional data set arranged chronologically and accumulating over time,which widely exists in various fields,such as fi-nance,healthcare industry,network operation.Large-volume,high-dimensional,and continuous accumulation of data are the inherent characteristic of time se-ries.Therefore,how to efficiently and effectively mine hidden information from time series data is a challenging research topic,which attracts more and more researchers at home and abroad to participate in it.The research of time series data mining mainly includes time series dimension-ality reduction and representation,retrieval,classification,clustering and anoma-ly detection,etc.This paper focuses on time series dimensionality reduction and representation,similarity measurement and anomaly detection.The main contents and achievements can be summarized as follows.(1)We propose a novel time series representation method called Feature-based Online Segmentation(FOS).FOS utilizes the trend turning points and the cor-responding importance indexes for the selection and optimization of segmenting points,and then the optimized segmenting points are applied to carry out the forward and backward segmentation based on slope calculation,and finally FOS represents the raw time series with several segments.Maximum Error of single point of FOS is used to constrain the fitting error of the representation,which ensures that FOS achieves ideal fitting accuracy and efficient representation.In the experiment section,we demonstrate that FOS can represent raw time series effectively and retain the overall trend characteristics of the original data,and the fitting accuracy can also be guaranteed.(2)We propose an efficient and novel method of learning the optimal prefix and suffix invariant size for Dynamic Time Warping(DTW),which is called Rapid Optimal Prefix and Suffix Invariant Size(OPSIS)Search Algorithm(ROSS),and the corresponding anomaly detection method.Using the techniques of prun-ing and delaying real distance calculation,ROSS utilizes a strategy based on the nearest neighbor classification look-up table of time series to calculate the classification error rate under different PSIS,and finally achieves OPSIS with the lowest,classification error rate.Compared with the naive approaches,the time complexity of ROSS is significantly reduced.In the experiment section,we demonstrate that ROSS significantly improves the efficiency of learning OPSIS,and as the length and the number of time series data increase,the learning effi-ciency can still maintain a high level.(3)We propose a novel time series Anomaly Detection method based on Feature-based Symbolic Representation(ADFSR).ADFSR firstly transforms the raw time series into feature-based symbolic representation with 7 charac-teristic values of subsequences,which can be used for calculating the similarity arnong subsequences.Experimental results for algorithm parameters,simulation data and real word time series data have demonstrated that ADFSR.achieves valid and stable anomaly detection results,and the time complexity of ADFSR is significantly reduced by feature-based symbolic representation.(4)We propose a novel Multi-domain Space Piecewise Aggregate Representation(MSPAR)and the corresponding Anomaly Detection method MSPAR-AD.MSPAR not only pays close attention to recognize the significant changes of time series in the amplitude domain,but also keeps a watchful eye on identifying the cor-responding variations in the temporal domain.Concretely,MSPAR-AD evenly divides a time series into non-overlapping sequences in the first place.Secondly,all the sequences are projected into the corresponding multi-domain space and represented in accordance with their own amplitude temporal features.Thirdly,the corresponding anomaly scores can be calculated based on the above repre-sentation results.Finally,all anomaly sequences with relatively high anomaly scores in the given time series can be are detected effectively.In the experiment section,we perform extensive experiments on synthetic and real world time se-ries datasets,to justify the superiority of the proposed method MSPAR-AD for anomaly detection.(5)We design and develop a network traffic anomaly detection system based on time series data mining technology,which realizes the application of academic research in production practice,and promotes the transformation of scientific and technological achievements.
Keywords/Search Tags:Time Series, Dimensionality Reduction, Streaming Time Series, Similarity Measurement, Dynamic Time Warping, Anomaly Detection
PDF Full Text Request
Related items