Font Size: a A A

Research On Classification And Clustering Algorithms For Data Stream Mining

Posted on:2015-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:D D ZhangFull Text:PDF
GTID:2268330425970536Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development and increasing popularity of computer software and hardware, data processing in different fields is undergoing an explosive growth. The widespread use and large quantity of data bring us to a real data world. In general, these data are accumulated in the form of data streams. Different from traditional static data, data streams have the characteristics of being dynamic, large scale, fast changing, arriving continuously and rapidly etc, these characteristics require that data flowed into the system must be real-time processed. Therefore, traditional data mining algorithms are unable to be applied directly to data streams.The characteristics of data streams require that mining data streams must meet the following basic requirements:First, the algorithms should process rapidly arriving data. Therefore, the computational complexity of the algorithm should be low; Furthermore, because of the limited memory that is unable to store unbounded volume of data, the space complexity of the algorithms should be low so that an approximate solution may be obtained with the partial data stored in the limited space; In addition, since data streams are time changing the parameters of the algorithms should be dynamically adjustable to such changes. Thus, how to extract useful information from the data stream has become a hot challenging topic in data mining.This thesis first summarizes the theoretical basis of data stream mining and some relevant mainstream technologies, and then introduces some existing data stream mining algorithms on data stream classification, clustering and frequent pattern mining respectively. On this basis, we present a classification algorithm for mining binary data streams with skewed distribution and a clustering algorithm based on density and grid are implemented respectively.The main work of the thesis is summarized below:1. Current classification algorithms over data streams are analyzed. In order to deal with the problems of skewed distribution and concept drifting, by integrating the current classification algorithms and Weighted Classifier Ensembles algorithm, a classification algorithm called SeRt with better adaptability is proposed. The experimental results show that the algorithm can effectively solve the problem of binary skewed distribution and the concept-drifting phenomena that exists in data streams.2. Traditional clustering algorithms and data stream clustering algorithms are discussed. A clustering algorithm PKS-Stream-I over data stream based on density and grid is proposed. It is an optimization of PKS-Stream in density detection period selection, sporadic grid detection and removal. Empirical results show that the proposed method yields better performance and has better cluster results at a lower time complexity and space complexity.
Keywords/Search Tags:data stream, classification, clustering, skewed distribution, density-grid
PDF Full Text Request
Related items