Font Size: a A A

The Data Mining Technology And Application

Posted on:2003-10-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:H W FengFull Text:PDF
GTID:1118360092466133Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Time series database is designed to describe,store and manipulate sequence data or time series data. Although it can fulfill the queries and operations on the sequence data,it can not find the sequences or subsequences which have the same or approximately same pattern with the query sequences. It is necessary to extend the capability of the queries to find the hidden knowledge in the database.Data mining is a technology to find the unknown,hidden and interesting knowledge from the massive data. Time series data mining includes trend analysis,periodic pattern mining,sequential pattern mining and similarity search. In this dissertation,we study the similar time series search,and present the method to retrieve the similar sequences which have the approximately similar pattern instead of only those which have the same value. The research includes the transform of time series,the measure of the similarity,the index for the time series database,and the process of the similarity query.The work and contributions are listed as following:1 We present a time series segmenting method to present the changing pattern of time series. In the method,the remarkable points of the series are selected as the end points of the segments,and the number of the remarkable points can be controlled by the parameter ?remarkable duration. This method is robust and consistent.2 To measure the similarity of time series patterns,we define the sequential mapping of two sequences,which describes the aligning of patterns along the time axis. The fuzzy distance is defined to measure the similarity of two elements,and the similarity distance of two sequences is defined as the mean of the fuzzy distance of the elements on the mapping path. This measure is independent of the length of the sequence,and it can cope with the difference of frequency,the expansion or the compression of the frequency.3 we present an iteratively refined similar time series searching method. In order to improve the efficiency,we use the sampling points of the sequences to compute the distance of two sequences. The distance of sampling points is used to filter the sequence of the database,so the similarity searching space is reduced and theefficiency of the query is improved.4 The pattern-based similar subsequence search method is presented. The sequences are transformed to a relative sequence,in which all the segments are normalized by the previous segment. The relative sequences are categorized and indexed by suffix tree,and the result of the suffix tree search is the potential similar subsequences. The experiments show that this method can find the similar pattern subsequences,which are possibly quite different in the value or scale of the segments.5 we present the similarity-based time series clustering method. The aggregate hierarchy clustering method is used. To ensure the right aggregate order,we consider the distribution distances of the clusters while computing the distance between two clusters. To improve the efficiency,the aggregating process begins with an initial cluster partition instead of all the individual time series.6 we present the query optimization method. For the clustered time series database,the query sequence is classified to one of the cluster,and the efficiency of querying is improved for the similarity search space is limited in the cluster,.7 We implement a client/server query system and test the presented methods.
Keywords/Search Tags:Time Series Database, Data Mining, Fuzzy Similarity, Clustering, Query Processing
PDF Full Text Request
Related items