Font Size: a A A

Research On Key Issues Of Time Series Data Mining Based On Rescaled Range Analysis Theory

Posted on:2011-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2178360305950063Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the increasing popularity of computer information systems, people have accumulated a large amount of various types of data in daily transaction processing and scientific research. In these saved data, most of which is time series data, such as in financial markets, the daily stock price changes; in retail industry POS systems, daily sales of a commodity; in weather forecast study, an area of daily temperature. How to analyse and process these vast amounts of time series data analysis of the data processing, dig out the hidden information in the data, which reveals the development and change for internal rules and found that interactions between different things, is very important for us to recognize things and make scientific decision. Time series data mining is a new data analyse technology which is proposed to solve this problem. According to use time series data mining technology, we can get useful information related to time which contained in the data and carry out knowledge discovery and rule extraction. In this paper, time series representation, similarity search and time series distance measurement in non-stationary time series data mining technology are analysed and researched. Main contents and innovation are as follows:1) Time Series RepresentationWe have appled the fractal technology to the APCA (Adaptive Piecewise Constant Approximation) and proposed a high precision time series symbolic representations of RSPA (Rescaled range Symbolic Piecewise Approximation) which is based on fractal, the rescaled range theory and the symbols of the method is applied to the existing methods of time series, it not only retains the time series' non-linearity and fractality, but also realized the dimension reduction. Experimental results show that the method has high efficiency in similarity search, classification tasks and many other data mining tasks.2) The time series of measurement for distanceWe have proposed time series distance measurement formula based on RSPA representation, and given a theoretical proof that the distance between two time series which is calculated by the formula we proposed is smaller than the original Euclidean distance, so show that the usability of time series representation; in this paper, we also gives the algorithm description of the similarity data mining, experimental data show that the method in the time series similarity data mining has high accuracy and requires less storage space.3) Research of the framework of time series analysis systemThe thesis proposes an opened integrated framework of time series analysis system. This framework is composed by functional unit, and providing flexible interface. The most merit of this system is modularization. It can support various synthesis-advanced services, such as mining patterns, similarly searching in time series and classification. The application of the math pattern databases and the unit of similarity search strengths the compatibility and expansibility of the system.
Keywords/Search Tags:data mining, time series, rescaled range analysis, similarity search, symbolic representation
PDF Full Text Request
Related items