The Research On Clustering Algorithm Of Data Stream

Posted on:2010-01-10

Degree:Master

Type:Thesis

Country:China

Candidate:Y Z Cao

Full Text:PDF

GTID:2178360275477780

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development and broad applications of information technologies, streaming data have become universal, such as supermarket transactions, Internet search requests, telephone call records, data from satellites and astronomy etc. Because streaming data are continuous, high-volume and open-ended, traditional mining algorithms cannot mine databases from these data environments in real-time, which lead to the loss of useful information. Clustering is one of the most important branches of data mining, but it is a challenge to perform clustering in data streams with traditional classification models.Most of traditional clustering algorithms need scanning the databases multiple times and need storing the entire data, which are not suitable for the streaming environments. However, it is very significant and valuable to explore the new models of clustering and methods for prediction and clustering in real-word applications. The main contexts are as follows.(1) In order to enhance the performance of noisy and unbalanced data streams clustering, an effective double-clustering clustering algorithm (TCLUSA) based on the partition method is proposed. TCLUSA, which employs DBSCAN to achieve sub-clustering results by using the means of each block after deleting outliers. Then, the final results will be available with k-means method based on former blocks.(2) Many data streams in present world have mixing attributes, which have both numerical attributes and discrete attributes. However, the existing methods are often only oriented to one kind of attribute, and other ones are simply given up, which decrease the precision. This paper presents a data stream clustering algorithm for mixing attributes based on the grid, using a kind of geometric adjacency and information gain found on mixing data similarity, which can effectively deal with the mixing attributes.

Keywords/Search Tags:

Data Streams, Clustering, Heterogeneous Attribute, Grid

PDF Full Text Request

Related items

1	Research Of Optimized Clustering Algorithms Over Data Streams
2	Mining Clusters In Data Streams
3	Research Of Probability Density Grid-based Clustering For Uncertain Data Streams
4	Research On Uncertain Data Streams Clustering Algorithm Based On Tuple Cluster Feature
5	The Application And Research Of Incremental Clustering On Temporal Data Streams
6	Research And Implementation On Clustering Algorithms In Uncertain Data Streams Environment
7	Research On Clustering Algorithm Over High Dimensional Data Stream Based On Irregular Grid Data
8	Approach To Dynamic Pattern Discovery And Trace In Data Streams
9	Study On Key Technologies Of Frequent Items Mining And Clustering On Data Streams
10	Research And Application Of Heterogeneous Data Integration And Clustering Mining