Font Size: a A A

High Perfermance Data Stream Pattern Discovery Algorithms And Their Applications

Posted on:2009-11-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q ZhouFull Text:PDF
GTID:1118360272978714Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of sensor and network technology, various applications generate a large number of stream data, such as network traffic management, environment monitoring, industrial control and finance analysis. These applications share several distinguishing features: the need for real-time or almost real-time continuous analysis, huge volumes of data, and high data rates arrivals. Traditional data mining models of "store and then analysis" are ill-equipped for mining high data rates and transient data stream, mining data stream poses many new challenges.There are a lot of patterns in the data stream, how to discovery and identify these patterns efficiently is the core problems of many applications. Recent year, pattern discovery in data stream has been becoming one of most challenge research topics. To improve performance of pattern discovery algorithms in data stream, the mechanism of robust and incremental are introduced in this dissertation, and these algorithms are applied to industrial process analysis. The highlights of our contributions are listed as follows:1) By combining an incremental recursive least square algorithm for regression parameter estimation with the generalized likelihood ratio test for change-point detection, a real-time trend extraction algorithm for dynamic data streams is proposed. To segment automatically and extract trend of data stream, the proposed algorithm estimates parameter of linear regression by incremental method and detects boundary points by generalized likelihood ratio test. Remarkably faster computational speed and higher trend analysis accuracy have been achieved by this algorithm compared with the best existing algorithms in the same field;2) A robust on-line data stream change detection algorithm based on data-driven is presented. Firstly, sample data stream by two neighbor windows of given length. Then the sampling data is projected to normalized high dimension feature space and the two minus hypersphere models of two window sampling data sets are constructed respectively(outliers are removed). Finally, detect change by computing cosine of inclination angle of two centrals of hypersphere. The algorithm not only is robust but also doesn't need priori knowledge;3) A data stream outlier detection algorithm based on recent-biased dynamic least square support vector regression is proposed. The algorithm is modeled by recent-biased dynamic least square support vector regression, therefore it can solve learning problem by linear equation and track dynamic of data stream accurately by incremental and decremental learning mechanism. The algorithm overcomes the shortcoming of modeling by standard support vector regression need computes repeatedly when a sample adds or deletes, not only can achieve fast computational speed but also high accuracy, and can detects outlier in data stream efficiently;4) An recent-biased clustering algorithm of data stream based on tilted-time window is proposed. First, the algorithm segments sliding window equal in length to form no overlap data blocks(basic window). Then extract feature of every data block through Haar wavelet transform, and preserve detail feature of recent data by varying number of wavelet coefficients of data block, namely more recent data block, more wavelet coefficient preserved, and vice versa. Finally, by defining recent-biased distance of data stream, implements the recent-biased clustering algorithm of data stream based on tilted-time window. Remarkably faster computational speed and higher efficient have been achieved by this algorithm;5) Applies the proposed pattern discovery algorithms of data stream to real industrial process. According to the characters of complex process data of iron and steel making, two pattern discovery tasks have been implemented: outlier detection and pattern change detection. The results show that the proposed algorithms have promised future to analyze data generated by complex industrial process.In sum, in this dissertation, several high performance pattern discovery algorithms and their applications are studied, they are improvement and supplement of the existed algorithms. Comparing to the existing algorithms in the same field, theory and simulation results show that the proposed algorithms are higher performance(accurate, computational speed and robust).
Keywords/Search Tags:data stream, pattern discovery, trend extract, change detection, outlier detection, clustering
PDF Full Text Request
Related items