Font Size: a A A

Research On Data And Data Stream Clustering Algorithms For Mixed Attributes

Posted on:2010-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:T H WuFull Text:PDF
GTID:2178360272479069Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Data mining is one of the most active parts in database research now. In recent years there has been a new data mining application known as the data stream mining, such application deal with the data which generated in the form of streaming data, such as sensor data, the website click stream, real-time monitoring systems. The data stream is characterized by chronological order, rapidly changing, and the mass of unlimited potential. As the data stream with these characteristics, a new data mining methods need to be developed which is single-pass scanning, on-line, multi-layered and multi-dimensional.A great many methods on stream clustering have been proposed, however there are still many problems need to be researched and resolved. In this paper, we study the problem of data stream clustering, this article made the following improvements:1. Improve the density clustering for normal data. DBSCAN is a common density based cluster algorithm, in order to avoid its shortcoming which can't be used in mixed numerical and categorical attribute data, a new mixed-attribute cluster algorithm named M-DBSCAN was presented which use the dimension-oriented distance to measure the difference between two data objects. The simulations illustrate that the new algorithm can solve the mixed-attribute data cluster problem efficiently.2. Design a density based data stream clustering algorithm for mixed-attribute data sets. CluStream algorithm is not capable enough to handle mixed numerical and categorical data, in order to avoid this shortcoming, a new data stream cluster algorithm named MCStream was presented which use the idea of dimension-oriented distance. The simulations show that the new algorithm can cluster the mixed-attribute data stream efficiently and quickly.
Keywords/Search Tags:data mining, data streams, clustering, density
PDF Full Text Request
Related items