Font Size: a A A

Research And Application Of Complex Data Stream Clustering Algorithm

Posted on:2014-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:J P ZhangFull Text:PDF
GTID:2268330401976813Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the acceleration of the evolution of network convergence, the proportion of streamingdata services in the network represented by multimedia application is increasing, and it becomesthe mainstream data existence increasingly. Different with the traditional data, these data havethe characteristics of quick succession, sustained growth, dynamic evolution which brings aboutmany problems in the cluster analysis of the data stream. So the data stream clustering analysishas been extensively studied by scholars at home and abroad.Firstly, this paper analysis the factors that influence the performance of the static dataclustering largely based on studying the traditional clustering algorithm. Then starting with thesefactors it carries out depth study of the clustering problem of the static complex data. Throughthe analysis, we found that the affinity propagation algorithm outperforms other algorithms, andit is beneficial to extend to data stream clustering. So we take affinity propagation clusteringalgorithm as the focused algorithm of this paper. Secondly, through the depth study of theclustering problem for the complex stream data with real-time arrival, affinity propagationclustering is applied to the evolutional data stream which realizes efficient, accurate and fastclustering in evolution data stream. Finally, combining with the characteristic of large-scalenetwork data stream, we study the data stream clustering problems in distributed environmentand design an application mechanism in a distributed environment. Also, combined with theproject requirement, it realizes a rapid discovery method of abnormal information samples basedon stream clustering model.Three following aspects are the main theoretical research work around the complex streamdata clustering problem in this paper:1) An affinity propagation clustering algorithm that processing huge amounts of complexdata is proposed.In view of the shortcomings that the affinity propagation algorithm can only find thespherical clusters, we design a density auto-adapted―manifold distance kernel" measure basedon the assumption of the local consistency and global consistency of clustering. Accordingly, anaffinity propagation clustering algorithm based on hybrid measure (APCHM) is proposed whichovercomes the defects that the original algorithm cannot handle non-convex structure clustering.Thus a parallel APCHM algorithm (P-APCHM) is proposed which can improve computationalspeed greatly while maintaining the clustering performance.2) A fast, efficient, adaptive cluster changes online clustering algorithm for data stream isproposed.For the shortcomings when the current algorithm is applied to the complex stream, weextend the APCHM algorithm to data stream clustering algorithm and propose a data streamclustering based on density and affinity propagation techniques (StrDenAP). The algorithmadopts an online/offline two-stage processing framework and it introduces the micro-clusterdecay density to reflect the evolution of the data stream accurately. In the meantime, it uses themechanism of online dynamic maintenance and deletion of the micro-cluster which makes the algorithm’s model more consistent with the intrinsic characteristics of the original data streams.It can detect the changes of the data stream in real time, and give the clustering results at anytime. The experiments on real data sets and artificial data sets show that the algorithm has goodapplicability, efficiency, and scalability, thus it can achieve better clustering results.3) An application mechanism for data stream clustering in distributed environment isdesigned.The StrDenAP algorithm is further applied to the distributed environment, and then adistributed data stream application mechanism D-StrDenAP (Distream Stream Clustering basedon Density and Affinity Propagation) is proposed.Take local clustering with StrDenAP algorithmunder a sliding window in each local sites, and then upload the summary data structure of theupdated data stream to the central site, the central site integrated all local models byDensity-based fusion and fed back to the local site. Experiments show that the algorithm notonly improves the quality of distributed data stream clustering, but also reduce thecommunication cost significantly.
Keywords/Search Tags:data streams, clustering analysis, affinity propagation, density-based clustering, distributed processing
PDF Full Text Request
Related items