Font Size: a A A

Research On Ensemble Clustering

Posted on:2019-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y TangFull Text:PDF
GTID:2518306473954059Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology in the recent decades,a great deal of data has been accumulated in research institutes and business organizations,and technologies such as big data and data mining have been rapidly developed.As an important technique in machine learning and data mining,the clustering algorithm has been paid a lot of attention Among the clustering algorithms,ensemble clusterings have attracted great attention due to its better clustering accuracy and robustness than single clustering algorithmsHowever,there are still some problems in the current ensemble clustering algorithms On the one hand,the accuracy and robustness of ensemble clustering still need to be further improved.On the other hand,the current ensemble clustering algorithms have a high time complexity,especially on large-scale data sets.As the size of the data set increases,the algorithm's time complexity becomes unacceptable.To improve the accuracy of clustering and the time efficiency,we conducted the following two works(1)We propose an ensemble clustering algorithm based on cluster selection and cluster splitting.The algorithm uses a kind of cluster evaluation criteria based on the minimum spanning tree to evaluate the quality of the cluster.Based on the cluster quality,the clusters are selected and executed the cluster splitting operation recursively.Then the improved co-association matrix is obtained,and the final result is obtained by using the spectral clustering algorithm.The algorithm has higher average accuracy and robustness on our data sets(2)In order to cluster effectively on the big data set,we proposes a ensemble clustering algorithm based on iterative sampling.The algorithm uses sampling method to reduce the time complexity,and uses the gaussian mixture model to fit on the sampling data.Through the iterative sampling,we ensemble the gaussian mixture models to get more accuracy clustering results.In order to test the effectiveness of the proposed algorithms,we have conducted detailed experiments.The data set includes artificial data sets,two-dimensional data sets and real data sets.The clustering accuracy and the time efficiency were tested.And we used some traditional clustering algorithms and ensemble clustering algorithms for comparison.Experi-ments show the effectiveness of the two proposed algorithms in clustering accuracy and time efficiency.
Keywords/Search Tags:ensemble clustering, co-association matrix, cluster selection, cluster splitting, big data, sampling
PDF Full Text Request
Related items