Font Size: a A A

Hierarchical Clustering Algorithm For Mining Frequent Patterns And Time-series Flow

Posted on:2014-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:X H ZhouFull Text:PDF
GTID:2268330425987502Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Time series data mining is one of the top ten of the most challenging research directions in the field of data mining. Time series stream is a continuous, high-speed, unlimited and time-varying ordered sequence which is arranged in accordance with the time. The continuity requires that mining algorithms should scan a few times; high-speed requires that mining algorithms should have high real-time; infinity requires that mining algorithms can catch data and load it into the main memory in batches; time-variant of the data content requires that mining algorithms should have the ability to solve the problem of concept drifting in order to reflect evolution properties of the data. Due to the complexity of time series stream, mining research of time series stream is still in an exploratory stage. In this paper, we will carry out research in two directions:hierarchical clustering system and sequence frequent patterns mining.Hierarchical clustering algorithms have been widely applied to clustering analysis of data streams which have such advantages as no need to set the number of clusters, suitable for a variety of cluster structure and high efficiency. In this paper, a hierarchical clustering algorithm in the framework of data streams-ODAC(the Online Divisive Agglomerative Clustering) is mainly researched. The ODAC algorithm is a incremental clustering algorithm and uses a top-down strategy to build a tree-like hierarchy of clusters. But there are two problems:noisy data can influence the clustering results and the cost of reconstruct the cluster structure when processing concept drift is excessively high. An improved algorithm based on rough set theory-ODRC (the Online Divisive Rough Clustering) is proposed. The ODRC algorithm is divided into two phases:the algorithm builds a hierarchy of clusters according to the time series stream samples in the first stage; in the second stage it uses limited tolerance relation model to extend rough set definitions of clusters in order to obtain equivalence partition of clusters based on rough set theory. The algorithm is able to obtain more reasonable cluster structure than the original one, and enhances the processing power of the concept drift over time-series stream. Experimental results show the feasibility and effectiveness of the new method.The mining of sequential patterns is one of the hottest spots in the field of data mining. The purpose of sequential patterns mining is to find the frequent sequences in transaction databases and then use these patterns to help decision-markers. A sequence patterns mining algorithm based window sliding techniques-MFI-TransSW is introduced in this paper. MFI-TransSW algorithm uses a sequence of bits to implement the sliding window operation, which preferably meets the tickler that system achieves rapid processing under the limitations of each data element should be scanned at most once and memory is a limited resource. In the view of the situation that MFI-TransSW algorithm has low efficiency in the pattern generation phase, a new multi-thread algorithm named MFI-MultiSW is proposed, which is based on that the window is divided into a fixed number of batches of transactions. The linear linklist is used in MFI-MultiSW algorithm to store the information about the current candidate itemsets and the transactions in the current window, and it uses a multi-threaded approach to generate frequent patterns based on the linear linklist. The experimental results show that the improved algorithm has higher implementation efficiency compared to the original algorithm and performance improvements are more significant especially in multi-core environment.
Keywords/Search Tags:time series stream, hierarchical clustering, incremental system, frequent pattern, multi-thread
PDF Full Text Request
Related items