Font Size: a A A

The Research On Clustering Algorithm Of Data Stream

Posted on:2010-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z CaoFull Text:PDF
GTID:2178360275477780Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development and broad applications of information technologies, streaming data have become universal, such as supermarket transactions, Internet search requests, telephone call records, data from satellites and astronomy etc. Because streaming data are continuous, high-volume and open-ended, traditional mining algorithms cannot mine databases from these data environments in real-time, which lead to the loss of useful information. Clustering is one of the most important branches of data mining, but it is a challenge to perform clustering in data streams with traditional classification models.Most of traditional clustering algorithms need scanning the databases multiple times and need storing the entire data, which are not suitable for the streaming environments. However, it is very significant and valuable to explore the new models of clustering and methods for prediction and clustering in real-word applications. The main contexts are as follows.(1) In order to enhance the performance of noisy and unbalanced data streams clustering, an effective double-clustering clustering algorithm (TCLUSA) based on the partition method is proposed. TCLUSA, which employs DBSCAN to achieve sub-clustering results by using the means of each block after deleting outliers. Then, the final results will be available with k-means method based on former blocks.(2) Many data streams in present world have mixing attributes, which have both numerical attributes and discrete attributes. However, the existing methods are often only oriented to one kind of attribute, and other ones are simply given up, which decrease the precision. This paper presents a data stream clustering algorithm for mixing attributes based on the grid, using a kind of geometric adjacency and information gain found on mixing data similarity, which can effectively deal with the mixing attributes.
Keywords/Search Tags:Data Streams, Clustering, Heterogeneous Attribute, Grid
PDF Full Text Request
Related items