Font Size: a A A

Clustering Of Data Streams Based On Artificial Bee Colony Algorithm

Posted on:2015-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:J F ChengFull Text:PDF
GTID:2298330422983847Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet of things and cloud computingtechnology, real-time dynamic changes take place in some application areas, Thesedata is vast and unlimited, the data at the speed of unpredictable flow into and out ofthe computer system, it be defined as a data flow by academics. For this new type ofdata flow, that we still use the traditional data mining methods to analyze and dealwith is obviously inappropriate. So, the emergence of this new form of data is anurgent need to develop new data mining method to resolve the dilemma. Clustering isone of the very important technology in data mining field, it analysis of data streamhas considerable practical significance. So far, has some of the data stream clusteringalgorithm is proposed, through the experiment show that have a good work efficiency.This paper focuses on how to design a high efficiency, high quality and strongexpansibility of the data stream clustering algorithm. The article summarizes therelated theory of data mining and clustering technology, introduce the existing datastream clustering algorithm, that be analyzed and compared. On this basis, accordingto the characteristics of the data stream propose an efficient and extensible data streamclustering algorithm be called ABCCluStream algorithm. The algorithm using theclassic online-offline two-stage framework of CluStream algorithm, based on thetheory of the artificial bee colony algorithm defines related parametersmicro-structure in the process of clustering. ABCCluStream algorithm clusteringprocess is: In the online-stage, we use the K-means algorithm to cluster to finish theinitialization of micro-cluster, when the new data points, arrive, they should bedivided into the cluster with the largest similarity degree, In order to highlightcharacteristics of data stream, the cluster regularly updated.at the same time, in orderto facilitate the subsequent analysis the computer should save the feature of themicro-cluster clustering in the pyramid structure in the form of a snapshot at regularintervals, at the same time to update of small clusters, in order to find the every formdata. In Off-line stage, we take out all micro-clusters within a user-specified timerange from the hard drive; selection contains a certain number of clusters as a virtual point, according to the definition of the objective function, using the swarm algorithmto cluster the micro-cluster. Experiments show that ABCCluStream clusteringalgorithm is high quality and good extensibility, meets the requirements of high purity,single-pass scanning and returning results in real-time, this algorithm is suitable forthe analysis and research of the large-scale dynamic data stream clustering.
Keywords/Search Tags:data stream, clustering, artificial bee colony, fitnessyields
PDF Full Text Request
Related items