Font Size: a A A

Data Stream Clustering Algorithm Analysis Using A Flock Of Agents

Posted on:2014-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LinFull Text:PDF
GTID:2308330461973916Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the data to deal with in the real life growing rapidly, mass of data in stream form exists widely in a lot of fields which has the characteristics of time-order, data distribution changing rapidly and potential infinite. Many scholars through extending classical clustering algorithms to propose data stream clustering algorithms, which mostly use the centralized strategy and need to acquire some prior knowledge (such as class number) to complete clustering. However the users can’t predict the data distribution to afford prior knowledge consistent with the reality, thus the algorithms can’t get a good clustering result.Because the swarm intelligence has the advantages of distribution, robustness and scalability, this paper plans to solve the data stream clustering problem using a flock of agents. Firstly, through improving the flocks of agent-based clustering and data visualization algorithm (FClust), a static data clustering algorithm (Attraction-added and Influence-improved FClust, AIFClust) is proposed. In the algorithm, agents which have the same behaviors with birds interact with neighbor agents. Through their movement on the visual panel, the data represented agents ultimately gather into clusters. The whole clustering process can be visualized, strengthening the algorithm’s user experience. Based on the comparison experiments of the improved algorithms and FClust, we can see that the influence-improved algorithm (IFClust) improves the quality of the clustering performance, and improves the algorithm’s clusters-found ability; the attraction-added algorithm (AFClust) speeds up the algorithm convergence and improves the algorithm’s stability; AIFClust algorithm inherits the advantages of these two algorithms, shows better performance than the FClust. Through the contrasting experiment of AIFClust with ant clustering algorithms and K-means, we found that AIFClust’s clusters-found ability is stronger than ant clustering algorithms, and its clustering accuracy is similar to K-means.In consideration of AIFClust’s good performance in clustering static data, this paper expands AIFClust to solve the uncertain data stream clustering problem and puts forward the data stream clustering algorithm (FClustStream). This algorithm adopts the online-offline clustering model:in the online stage the algorithm maintains a core agent buffer and a potential agent buffer, agents stored in the buffers absorb the data points through their probability attraction and use uncertain clustering characteristics to generalize the data information so as to form micro clusters. At the same time the algorithm updates the two buffers in the real time, deletes the agents out of date, and ensures the high weight agents storing in the core agent buffer. In the offline stage the algorithm clusters the agents in the core agent buffer using AIFClust. Through the comparison experiments of FClustStream and EMicro, we can see that FClustStream that adopts distributional swarm intelligence has faster processing speed than EMicro that uses centralized strategy, and can obtain more compact clusters and a more reasonable clustering result with the class number unknown, it also has scalability.
Keywords/Search Tags:Data stream clustering, Swarm intelligence, FClust, Data visualization, Uncertain data stream
PDF Full Text Request
Related items