Font Size: a A A

Research On Clustering Algorithm Of Data Stream

Posted on:2012-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:G T PanFull Text:PDF
GTID:2218330368493636Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The data stream emerges in many new application areas with the rapid development of information society, such as network monitoring, intrusion detection, telecommunications, stock trading, e-commerce, web page visits, scientific research and so on. They orderly produce a large number of data with the form of stream by the time series. Their continuous arrival in multiple, rapid, time-varying, possibly unpredictable and unbounded streams appears to yield some fundamentally new research problems. The nature of stream data makes it essential to use algorithms which require only one pass over the data, real-time, low space and time complexity. At the same time, the data stream is always high dimensional data, and high-dimensional data often is mixed attribute data, so the algorithm on data stream must be able to handle mixed attributes data.A lot of clustering algorithms on data stream have been proposed, however there are many problems need to be researched and resolved. In this paper, high dimensional data similarity measure method is proposed to deal with high dimensional mixed attribute data stream clustering algorithm, the main job of the following improvements.1. Present a similarity measurement method of high dimensional data. The comparison between the distances of the high dimensional data objects doesn't exist when the method of distance measurement for low dimensional space is adopted in high dimensional space. The high dimensional data have the features of scarcity and the empty space phenomenon. Research on the distance function or similarity measurement for high dimensional data becomes one of the important research directions. Through analyzing and summarizing the inapplicability of the traditional measurement being used in high dimensional space, a new high-dimensional data similarity measurement method with the technology of feature selection has been proposed to measure the similarity between the objects in high dimensional space. The numerical simulation result demonstrates that the function is reasonable and effective in high dimensional data clustering.2. Present a clustering algorithm for high dimensional mixed attribute data stream. Clustering algorithm of high dimensional data called HPStream can not handle the mixed attribute data stream. In this paper, a similarity measurement for high-dimensional mixed attribute data is designed. It is successfully used to improve HPStream algorithm. Then propose a clustering algorithm named M-HPStream that can handle high-dimensional mixed attribute data stream. The simulations show that the new algorithm solves the mixed-attribute data stream cluster problem efficiently and quickly.
Keywords/Search Tags:data mining, high dimensional data, similarity measurement, data streams clustering
PDF Full Text Request
Related items