Font Size: a A A

Research On Mining And Its Application In Time Series Database

Posted on:2008-01-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y DuFull Text:PDF
GTID:1118360212999126Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity of computer and information technology, and the great development of storage technique of high capacity, a great amount of data is accumulated in daily work and in scientific research. Much potentially useful knowledge is hided behind data. Today how to manage and use these time series data efficiently and extract useful information is an important problem in data mining.Time series data reflects the features of attribute values along time sequence or spatial sequence. By mining patterns from time series data, we can get useful information related to time hidden in the database, thus implement extraction of knowledge. Time series are complex types of data. They often have high dimensionality, noise and various distortions etc. Time Series Data Mining (TSDM) is one of the most important research fields of Data mining. Its topics include time series representation, similarity search, clustering, classification, outlier detection and so on.Based on the actual application of analyzing well logging and mud logging time series in oil field, this paper discusses the current research situation, related work, and some up-to-date technologies and developments. In allusion to the problems of current approaches, we study time series data mining in four aspects: linear fitting, online segmentation, similarity search and temporal frequent patterns discovery. Some related algorithms and solutions are presented here. The main works and contributions of this dissertation are:1. A novel linear fitting algorithm based on key points is presented. The approach first chooses three continuous data points in turn when the data points in time series are scanned. According to the angle formed by these three data and the extreme value in monotone sequence, the method then records key points reflecting the sequence's changing feature. Using these key points, the original time series can be fitted linearly while some small noises are attenuated. The algorithm can find peak subsequences and jump points more accurately. Theory analysis and experiment results show that the new method is efficient.2. A new online segmentation algorithm for time series is presented, which is based on Hierarchical Clustering. According to the order characteristics of sequence data, a novel Segment Feature List is developed for saving segment information. In the algorithm, time series can be segmented effectively with one scan of the database and the time complexity is O(n). Historical information can also be inquired quickly using the Segment Feature List. Experimental results show that the algorithm is efficient and scalable.3 A new distance measure called Keypoints dynamic time warping distance is definedhere. This method computes warping distance using keypoints of time series. Experimentsshow that the new method is much more accurate than Euclidean distance. Compared withthe classical dynamic time warping distance, the keypoints dynamic time warping distanceproduces one to three orders of magnitude speed-up with no appreciable decrease inaccuracy.4. Without considering the function of time vector, the traditional mining algorithm called FP-growth doesn't be used to mine temporal frequent patterns directly. An improved algorithm from FP-growth is developed for mining temporal frequent patterns. The algorithm uses a novel Double B~+-tree to store time attributes of frequent patterns. Using the double tree structure, frequent itemsets can be discovered efficiently by performing two scans of transaction database. Experimental results demonstrate that this algorithm is efficient and scalable.5. According to the characteristics of exploration data series and the actual application in oil field, some test examples of these above algorithms are showed here, whose data is come from well logging and mud logging:â‘ Well logging curve can be segmented approximately and segmenting information can also be inquired quickly utilizing this online segmentation method of time series;â‘¡The superiority of the linear fitting method of time series is showed here. Users can acquire the pinnacle subsequence from well logging or mud logging accurately, record key turning points reflecting the sequence's feature and ignore data points with tiny changing. In this algorithm, the shape of curve is maintained while the storage is decreased greatly.This paper includes seven chapters. Chapter 1 gives an outline in TSDM, including the practical background, and some up-to-date technologies and developments in this field. Four parts of TSDM are discussed from chapter 2 to chapter 5: linear fitting, online segmentation, similarity search and temporal frequent patterns discovery. Based on the research before, Chapter 6 gives some applications using well logging and mud logging data. Finally, our conclusions are presented and further research perspectives are given in chapter 7.
Keywords/Search Tags:time series, linear fitting, key points, online segmentation, Segmentation Feature List, similarity search, temporal frequent pattern
PDF Full Text Request
Related items