Font Size: a A A

Research Of Similarity Search And Outlier Detection Algorithm On Time Series

Posted on:2009-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:H B DuFull Text:PDF
GTID:2178360272499426Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
A time series is a data sequence of observations which are ordered,changed and interrelated in time, which exists in various fields extensively, such as industry, economy, finance,science observing and social science, etc. How to explore the law and knowledge of available time series is an interesting problem. Classical time series analysis always propose a hypothesis first, then prove its validity, which is not suitable for discovery task. Time series data mining can extract hidden and potentially useful knowledge from large amounts of data which maybe omitted by user. Time series data mining attracts more and more attention.In this dissertation, similarity search and outlier detection of Subsequences are to be studied, which include several problems such as representation of time series, similarity measure,similarity search, definition and detection of outlier in time series. The main works and contributions of this dissertation are:(1) Subsequences Similarity Research for Time Series Based on ShapeIn cognizance of large amount and complex data character of time series, data mining directly on raw time series is time-consuming and inefficient. Sometimes, the accuracy and reliability of mining results will descend. Pattern representation of time series is abstract and summary of time series, also a high level feature description of time series. A shape-based discrete symbolic representation is first presented and its corresponding shape-distance formula to measure the similarity between Subsequences of time series. The present method is intuitive and compact, and not sensitive to the shifting, amplitude scaling, compression and stretch of data. The method can reflect the degree of the dynamic change of the tendency and erase the influence of the noises, and it has multi-scale characterization. Experimental results show the effectiveness of the presented algorithm.(2) Outlier Subsequences Detection for Time Series Based on LLMAt present, there is no definition of outlier of time series to be accepted by most of searchers. Three types of outlier of time series are concluded, and pattern outlier is discussed.In order to improve the effectiveness of the outlier subsequences detection algorithm in time series, the detection algorithm for outlier subsequences based on Local Linear Mapping(LLM) is first presented, in which subsequences in time series is mapped through the linear reconstruction by its neighbors . Based on the properties of LLM, two outlier indices(Reconstruction Error and Contribution Factor) are presented, applied to the outlier subsequences detection process for the time series data sets. Experimental results show that the presented algorithm is effective in detecting outliers in the outlier subsequences detection and can improve the effectiveness of the outlier subsequences detection.
Keywords/Search Tags:Time Series, Pattern Presentation, Similarity Search, Outlier-Subsequences, Outlier Detection
PDF Full Text Request
Related items