Font Size: a A A

Research On Incremental Clustering For Big Dataset

Posted on:2016-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2308330476454996Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining, search engine and big data technology have achieved more and more attention along with the rapid growth of information. Under the big data background, a lot of researchers focus on the clustering technology.Existing clustering methods mainly focus on static and single clustering algorithm. Static clustering method must scan the whole dataset before clustering. But it cannot work well when stream data is inputted dynamically. Another serious limit is the efficiency, includes the cost of time and memory. In other side, there is no single clustering method can achieve good results for all kinds of data distribution and applying situation. Incremental and ensemble clustering can solve the problems above.(1) We propose a new kind of incremental clustering method named ICGT-Incremental Construction of Gaussian mixture model(GMM) Tree. The distribution of each data cluster is approximated by a GMM. We introduce a tree structure to fulfill the incremental estimation of GMMs for data clusters. Each leaf node corresponds to a dense Gaussian distribution. The insertion or deletion of a data point will change the corresponding leaf node. Then the tree is updated or may grow up according to the distances between GMM distributions. Based on the generated GMM tree, the clustering results are decided by the width-first search.(2) We also propose a new kind of ensemble clustering method based on GMM and evidence theory. We introduce the GMM technique to determine the confidence for the candidate results from each clustering method. Then we employ the evidence theory to combine the evidences supplied by different clustering methods, based on which the final result is obtained.In this paper, we test our proposed method on synthetic dataset, 2-D visual dataset and real dataset. We also compared our method with traditional static, incremental and ensemble clustering method. The results show our methods have good potential dealing with big data and real applying problem.
Keywords/Search Tags:Incremental Clustering, Ensemble Clustering, Gaussian Mixture Modeling, Evidence Theory, Big Data
PDF Full Text Request
Related items