Research On Mining And Its Application In Time Series Database

Posted on:2008-01-25

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Du

Full Text:PDF

GTID:1118360212999126

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the popularity of computer and information technology, and the great development of storage technique of high capacity, a great amount of data is accumulated in daily work and in scientific research. Much potentially useful knowledge is hided behind data. Today how to manage and use these time series data efficiently and extract useful information is an important problem in data mining.Time series data reflects the features of attribute values along time sequence or spatial sequence. By mining patterns from time series data, we can get useful information related to time hidden in the database, thus implement extraction of knowledge. Time series are complex types of data. They often have high dimensionality, noise and various distortions etc. Time Series Data Mining (TSDM) is one of the most important research fields of Data mining. Its topics include time series representation, similarity search, clustering, classification, outlier detection and so on.Based on the actual application of analyzing well logging and mud logging time series in oil field, this paper discusses the current research situation, related work, and some up-to-date technologies and developments. In allusion to the problems of current approaches, we study time series data mining in four aspects: linear fitting, online segmentation, similarity search and temporal frequent patterns discovery. Some related algorithms and solutions are presented here. The main works and contributions of this dissertation are:1. A novel linear fitting algorithm based on key points is presented. The approach first chooses three continuous data points in turn when the data points in time series are scanned. According to the angle formed by these three data and the extreme value in monotone sequence, the method then records key points reflecting the sequence's changing feature. Using these key points, the original time series can be fitted linearly while some small noises are attenuated. The algorithm can find peak subsequences and jump points more accurately. Theory analysis and experiment results show that the new method is efficient.2. A new online segmentation algorithm for time series is presented, which is based on Hierarchical Clustering. According to the order characteristics of sequence data, a novel Segment Feature List is developed for saving segment information. In the algorithm, time series can be segmented effectively with one scan of the database and the time complexity is O(n). Historical information can also be inquired quickly using the Segment Feature List. Experimental results show that the algorithm is efficient and scalable.3 A new distance measure called Keypoints dynamic time warping distance is definedhere. This method computes warping distance using keypoints of time series. Experimentsshow that the new method is much more accurate than Euclidean distance. Compared withthe classical dynamic time warping distance, the keypoints dynamic time warping distanceproduces one to three orders of magnitude speed-up with no appreciable decrease inaccuracy.4. Without considering the function of time vector, the traditional mining algorithm called FP-growth doesn't be used to mine temporal frequent patterns directly. An improved algorithm from FP-growth is developed for mining temporal frequent patterns. The algorithm uses a novel Double B~+-tree to store time attributes of frequent patterns. Using the double tree structure, frequent itemsets can be discovered efficiently by performing two scans of transaction database. Experimental results demonstrate that this algorithm is efficient and scalable.5. According to the characteristics of exploration data series and the actual application in oil field, some test examples of these above algorithms are showed here, whose data is come from well logging and mud logging:â‘ Well logging curve can be segmented approximately and segmenting information can also be inquired quickly utilizing this online segmentation method of time series;â‘¡The superiority of the linear fitting method of time series is showed here. Users can acquire the pinnacle subsequence from well logging or mud logging accurately, record key turning points reflecting the sequence's feature and ignore data points with tiny changing. In this algorithm, the shape of curve is maintained while the storage is decreased greatly.This paper includes seven chapters. Chapter 1 gives an outline in TSDM, including the practical background, and some up-to-date technologies and developments in this field. Four parts of TSDM are discussed from chapter 2 to chapter 5: linear fitting, online segmentation, similarity search and temporal frequent patterns discovery. Based on the research before, Chapter 6 gives some applications using well logging and mud logging data. Finally, our conclusions are presented and further research perspectives are given in chapter 7.

Keywords/Search Tags:

time series, linear fitting, key points, online segmentation, Segmentation Feature List, similarity search, temporal frequent pattern

PDF Full Text Request

Related items

1	Research On The Similarity-Based Representation And Pattern Search Of Time Series
2	Research And Application Of Hydrological Time Series Similarity Pattern
3	Research On Data Mining Technology Of Pattern-based Similarity Search In Time Series Database
4	The Research Of Similarity Search Based On Dynamic Time Warping In Time Series
5	Research And Application Of Time Series Similarity Pattern Mining
6	Research On Sequential Pattern Mining And Time Series Similarity Search
7	Similarity Analysis Of The Problem Of Time Series
8	Time Series Data Mining Based On Similarity Analysis
9	Ant Colony Optimization For Time Series Segmentation And Its Application
10	A feature-based linear data model supported by temporal dynamic segmentation