Font Size: a A A

Clustering Algorithm Analysis Of Data Stream Mining Based On Particle Swarm Optimization

Posted on:2011-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2178330332962714Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and its application technology, people's ability of getting data increased to such a large extent and Data Streams has become one of important data source, that the data stream mining algorithms have become an important cutting-edge topics. Clustering data streams is an important branch of data stream mining, its main purpose of is to discovery the new knowledge model and hidden new rules.Data stream is a dynamic growing data set, which consists of continuous arriving data. From a limited data processing and analysis to an unlimited, people are faced with new challenges and need to find a new clustering algorithm. The CluStream is the most classic data stream clustering algorithm, Clustream clustering algorithm includes online and offline two parts, and in this article the main work is do offline Optimization to the data stream based on this model.The major work of this paper was show as following:(1) The advantages and disadvantages of particle swarm algorithm and genetic algorithm are analyzed and combined with the advantages of both to Optimize the centroid-based K-means clustering, making the K-means clustering algorithm to produce better clustering results, and " the data more similarity within the cluster, the date more dissimilarity between the cluster". The experimental data shows that:the hybrid clustering algorithm based on IGA and PSO which using Switching technology has better performance than the simple K-means algorithm.(2) As one of optimization algorithms, PSO algorithm sometimes reaches to local optimum because of Prematurity. In order to solve the problem of local optimum, the predator-prey particle swarm optimization (PPPSO) was introduced to solve the problem. In the PPPSO algorithm, the Predators Chases The center of prey, and the preys try to escape the predators; this is an effective way to prevent local optimum and to find the global optimum. This paper presents a clustering which using PPPSO to optimize the fuzzy mean method.(3)In the space of high dimensional data streams, in order to solve the affect of extra features on the quality of the data clustering, a data stream clustering algorithm based on Feature selection was proposed. This algorithm has the characteristics of automatic detection of excess unimportant Features and removed it. The experimental results indicated that DSCFC algorithm can detect the hidden redundant features in data stream and remove them. It is more effective and has better result than CluSteam algorithm.(4) In the data stream mining, in order to mine any interesting data stream model fully and quickly, If only using the existing algorithms based on frequent itemset mining complex models directly is difficult. To solve this problem, condition modal mining based on frequent itemset pattern was proposed. Starting from the frequent itemsets, to dig any mode which can not be immediately fond from itemset, that is, conditions pattern mining. Combined any conditions pattern mining with clustering can more quickly and more efficiently dig any interesting rules in database and discover new knowledge and new laws.
Keywords/Search Tags:data stream mining, Cluster analysis, particle swarm optimization, predator - prey, conditional pattern
PDF Full Text Request
Related items