Research On Clustering Algorithm Of Data Stream

Posted on:2012-11-11

Degree:Master

Type:Thesis

Country:China

Candidate:G T Pan

Full Text:PDF

GTID:2218330368493636

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The data stream emerges in many new application areas with the rapid development of information society, such as network monitoring, intrusion detection, telecommunications, stock trading, e-commerce, web page visits, scientific research and so on. They orderly produce a large number of data with the form of stream by the time series. Their continuous arrival in multiple, rapid, time-varying, possibly unpredictable and unbounded streams appears to yield some fundamentally new research problems. The nature of stream data makes it essential to use algorithms which require only one pass over the data, real-time, low space and time complexity. At the same time, the data stream is always high dimensional data, and high-dimensional data often is mixed attribute data, so the algorithm on data stream must be able to handle mixed attributes data.A lot of clustering algorithms on data stream have been proposed, however there are many problems need to be researched and resolved. In this paper, high dimensional data similarity measure method is proposed to deal with high dimensional mixed attribute data stream clustering algorithm, the main job of the following improvements.1. Present a similarity measurement method of high dimensional data. The comparison between the distances of the high dimensional data objects doesn't exist when the method of distance measurement for low dimensional space is adopted in high dimensional space. The high dimensional data have the features of scarcity and the empty space phenomenon. Research on the distance function or similarity measurement for high dimensional data becomes one of the important research directions. Through analyzing and summarizing the inapplicability of the traditional measurement being used in high dimensional space, a new high-dimensional data similarity measurement method with the technology of feature selection has been proposed to measure the similarity between the objects in high dimensional space. The numerical simulation result demonstrates that the function is reasonable and effective in high dimensional data clustering.2. Present a clustering algorithm for high dimensional mixed attribute data stream. Clustering algorithm of high dimensional data called HPStream can not handle the mixed attribute data stream. In this paper, a similarity measurement for high-dimensional mixed attribute data is designed. It is successfully used to improve HPStream algorithm. Then propose a clustering algorithm named M-HPStream that can handle high-dimensional mixed attribute data stream. The simulations show that the new algorithm solves the mixed-attribute data stream cluster problem efficiently and quickly.

Keywords/Search Tags:

data mining, high dimensional data, similarity measurement, data streams clustering

PDF Full Text Request

Related items

1	Research On Clustering Algorithm Based On Subspace In High-dimensional Data Streams
2	The Analysis And Application Of Clustering Algorithm For Multi-Dimensional Data Streams
3	Study On Key Technologies Of Frequent Items Mining And Clustering On Data Streams
4	The Research And Realization Of Clustering Algorithm In Data Streams Mining
5	The Application And Research Of Incremental Clustering On Temporal Data Streams
6	Research On Mining Algorithms Over Data Streams
7	Research And Implementation On Clustering Algorithms In Uncertain Data Streams Environment
8	Research On Technique And Application Of Mining Data Streams
9	The Research On A Few Key Issues In High Dimensional Data Mining
10	Research On Directional Clustering And It's Applications