Font Size: a A A

Research On Symbolization And Discord Discovery Of Time Series

Posted on:2018-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:W X FuFull Text:PDF
GTID:2348330533961376Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Traditional data mining algorithms are for data without timing relationships,and in recent years,with the increasing number of sequential data in daily life,such as time series data is recorded,the face of increasing time series data,In order to effectively analyze and discover the knowledge in these data,resulting in a new targeted approach.The time series data is a set of data in chronological order.It is widely found in almost every field,such as business,industrial,medical,scientific and environmental.Therefore,data mining on time series data has important guiding significance for effectively identifying things change,making scientific decisions and detecting various abnormal behaviors.In this paper,we discuss the two topics of symbolic representation of time series and anomaly subsequence detection,and discuss the problem of time series representation,similarity measure and anomaly detection of time series subsequence.In this paper,we first review two widely used time series reduction methods: the SAX symbolization method based on the sliding window and the PLR linear segmentation method based on the split point,compare the advantages and disadvantages of the two methods and use the scene;for the time series anomaly This paper focuses on the pattern anomaly detection algorithm for abnormal mode and the HOTSAX detection algorithm for abnormal sequences,including its ideas,principles and processes.It focuses on the efficiency of HOTSAX algorithm and the defects of the symbolic algorithm it relies on.Aiming at the problem of information loss and distribution in SAQ algorithm in time series symbolization,a new symbolization method based on time series trend information is proposed.The method first fits the time series into straight line,then The linear slope is discretized and mapped to the symbol space to complete the symbolic representation of the time series.In order to break the hypothesis of the traditional algorithm on the normal distribution of the data set,two new methods of symbol distance measurement are proposed.The method is based on the time series symbolization after dimension reduction,which can be more accurate for its trend and similarity analysis.In order to improve and optimize the efficiency of HOTSAX in the process of abnormal subsequence analysis,this paper proposes a new algorithm for finding the anomalous subsequence of fusion pruning strategy.The algorithm is based on the trend symbolization of time series.Firstly,Clustering algorithm clustering similar time series symbols and clustering the symbolic distance analysis between symbol classes to reduce the number of invalid comparisons in the process of searching for abnormal subsequences,thus ensuring the accuracy of the anomaly detection and accelerating the detection Process,improve the efficiency of the purpose.Finally,it is proved that the symbolization algorithm and the improvement of the anomaly detection algorithm are effective by several experiments,and the influence of different parameters on the algorithm is analyzed.
Keywords/Search Tags:Time series, Symbolization, Anomaly detection
PDF Full Text Request
Related items