Font Size: a A A

Research On Time Series Segmentation And Discord Discovery

Posted on:2013-04-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:G L LiFull Text:PDF
GTID:1220330392957276Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Time series is a sequence of data which changes with time order. Time seriessegmentation and discord discovery are very important in various domains, such asfinancial data segmentation, discord discovery in space telemetry and medical data,network monitoring, tracking and anomaly monitoring over moving object trajectorystreams. According to the deficiencies on time series segmentation and discord discovery,the following work have been studied, including time series segmentation based onsymbolic representation, time series similarity measure based on symbolic representation,segmentation technique for time series stream, bit representation based discord discoveryon static time series, fractal-based anomaly detection over time series stream.According to the actuality that most of symbolic representation based segmentationmethods only reflect the mean value of segments but lose the trend information, proposeTrend-based Symbolic approximation (TSX) segmentation method. After dimensionalityreduction, get the mean value of segments and refine the important trend feature, thendesign multi-resolution angle breakpoint interval search table and discrete the trendfeature to symbol, thus obtain symbolic dimensionality reduction TSX, which not onlycollects the mean value but also reflects the trend information. Experiments indicatecompared with Symbolic Aggregate approximation (SAX), TSX has lower false positiverate in similarity search, which can effectively support similarity search.Due to the measure of time series segmentation based on symbolic representationMINDIST_PAA_iSAX does not hold the symmetry, put forward a new measureSym_PAA_SAX based on SAX. Sym_PAA_SAX considers the equal status of twoevaluated time series in distance calculation, not only holds the symmetry, but alsosatisfies the lower bounding theorem. Experiments show that Sym_PAA_SAX has bettertightness of lower bounding and lower false positive rate.To adapt the critical characteristics of time series stream, i.e., online arrival, fastchanging, massive and unable to store the whole data, propose Exponential SmoothingPrediciton based Segmentation algorithm for time series stream (ESPS). Use the typicalexponential smoothing method to calculate the smooth value in future time as predictionvalue; bring forword prediction error judgement theorem, which can assure the normaldistribution; further conduct the relationship between the prediction error and compressionratio, which guides how to judge whether the data point is a segmentation key point; based on basic sliding window model, design ESPS algorithm. In order to make up thedisadvantage that most segmentation methods only use the total residual error asevaluation standard, experimental evaluation standards include Normalized Number ofSegments, Normalized Residual Error Sum, Normalized Overall Performance, time cost.Experimental results demonstrate, compared with Sliding Window algorithm and SlidingWindow and Bottom-up algorithm, ESPS algorithm has better effect and higher efficiency.To solve the high algorithm complexity and large amout of computation problem indiscord discovery, put forward bit representation clustering based discord discoveryalgorithm on static time series. Firstly, utilize Piecewise Aggregate Approximation mannerbit series to segment time series, the method not only captures the main trendcharacteristic of raw time series, but and also avoids the affect of noise; then, based on bitrepresentation and using the idea of accelerating by clusterting, a variation of k-medoidclustering algorithm is proposed, it can merge the similar change pattern as one class;based on the clustering algorithm, put forward discord discovery algorithm,which utilizestwo pruning strategies, heuristic pruning and pruning on the cluster center distance.Experimental results show the proposed algorithm can effectively discover the discord,improve the efficiency, and has good scaliblity.Time series discord discovery can be used for anomaly detection. To improve the effectof anomaly detection on time series stream, propose fractal based anomaly detectionalgorithm. Because the change of correlation dimension can be used as indicator of datatrend change in data set, adopt sliding window model containing basic windows, utilizecorrelation dimension to capture the pattern feature of current seen data in sliding window,design fractal based anomaly detection algorithm. Experimental results show, comparedwith stands for Trend and Surprise Abstractions based method and immunology systembased method, the fractal based algorithm can detect the anomaly effectively.
Keywords/Search Tags:time series, segmentation, clustering, discord discovery, anomaly detection
PDF Full Text Request
Related items