Font Size: a A A

Alternative Clustering On Stream Data

Posted on:2012-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:2218330368988134Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, data streams have attracted a lot of research interests. As an essential task in mining data streams, stream clustering has become a hot topic in this area. These algorithms usually produce only one single clustering within a certain time period. However, data streams can be usually interpreted in multiple perspectives and alternative clusterings are preferred in many real world applications.In this paper, we issue the new problem of alternative stream clustering, which aims to find two high quality and dissimilar macro-clusterings in a given data stream. We propose a new algorithm named AltStream consisting of two components. The online component of AltStream simultaneously maintains two alternative groups of micro-clusters which are used to record the statistical information about the evolving stream. During the online procedure, we develop a new method, the SOBD measure to approximately evaluate the dissimilarity between two clusterings containing some distinct data points from each other. When the users request to find two alternative macro-clusterings, the offline component is then invoked. After the two sets of micro-clusters are returned with respect to the specified time horizon and the number of clusters, an unsupervised alternative clustering algorithm, namely dec-kmeans, is then employed in the offline component to find two alternative macro-clusterings over one set of micro-clusters. The one with better quality is outputted as the first resulting macro-clustering, whereas the centroids of the other macro-clustering are extracted as the semi-supervised information. Under the guideline of these centroids, the second resulting macro-clustering is created by a weighted k-means algorithm.Experimental results on real world streams illustrate that our new algorithm performs better than some comparative methods, in terms of both quality and dissimilarity. Therefore, AltStream would be widely used in text stream, creadit card transaction flows, web logs and web page click streams, etc. In each real-world application, it would be important for the users to explore data streams in various aspects.
Keywords/Search Tags:Data Stream, Alternative Clustering, AltStream
PDF Full Text Request
Related items