Font Size: a A A

Research On Key Issues Of Stochastic Non-stationary Time Series Data Mining Based On Fractal Theory

Posted on:2010-09-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Y SunFull Text:PDF
GTID:1118360302480044Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the prevalence of the computer system and the development of the MSF memory technology the data will explode increasing. Most kinds of the data are time series, such as financial, the change of stock, POS system in retail trade, etc. Data mining is a new research field that studies the efficient method to extract hidden and potentially useful knowledge from the very large data sets. It is worthy and significant in both academe and practice to study the data mining method for mining the huge and still increasing size of time series data. The task has extracted more and more research interests.In this thesis we studied the data mining tasks in time series, such as similarity search, Distance measures, classification, discords detect, etc. In this thesis the author applied the grid and the fractal technology to the data mining tasks in which it allows dimensionality reduction and it also preserved the fractal features. Major contributions of this thesis include:1) Research of the time series representationIn this thesis the author proposes a novel time series representation called GMBR based on Minimum Bounding Rectangle in which the binary idea is first applied into the MBR(minimum bounding rectangle). And a high-precision approach based on fractal theory and R/S analyses are proposed. The representation is unique in which it allows dimensionality reduction and it also preserved the fractal features. The experiments have been performed on synthetic, as well as real data sequences to evaluate the proposed method.2) Research of the distance measures in time seriesIn this paper the distance measure formulas are proposed respectively based on GMBR and FSPA. The distance measure formulas are proved to be lower bounds the square Euclidean distance between original subsequences. It indirect shows that the time series representations proposed in chapter 3 are useable. In last the similarity search algorithms are described. The experiments have been performed on synthetic, as well as real data sequences to evaluate the proposed method. It shows that our methods have higher precision and need a little memory space.3) Research of the classification in time seriesA classification method based FSPA representation is proposed in this paper. This paper compares distance and model based methods on several data sets including synthetic and real data sets, to explicate the relative advantages and disadvantages of these methods from three mains: the size of the data, the length of the time series and the yawp. Our method used the fractal technology to preserve the fractal features, and our classification method has the lower distance measure than the original time series. The method we proposed has two steps to complete the data mining task. This paper presents several key observations on the relative merits of these two methods, and paves the way for further research in developing new methods for time series classification. The experiment demonstrates that our methods are more superiority.4) Research of the discords detect in time seriesThis paper proposed the definition of the discords detect of time series based the representation of the GMBR and it is the first time to combine the distance measure method with the density as to the author's best knowledge. This paper used the "detect eigenvalue" to weigh the detect degree of the time series. Based the definition of the discords detect proposed in this paper the author give the new discords detect algorithm named GMBR-DD (Grid Minimum Bounding Rectangle-Discords Detect). This algorithm can find the discords time series with high-effect. In the last the paper validated the definition and the algorithm proposed through three groups of the data. The experiment results show that the algorithm can catch the discords time series and the definition is reasonable. So the production provided a very effect flat roof and a powerful tool in data mining of time series.
Keywords/Search Tags:time series, data mining, similarity search, discords detect, classification, fractal theory, symbolic representation
PDF Full Text Request
Related items