Font Size: a A A

Research On Ensemble Clustering Algorithm Based On Bilateral Clustering

Posted on:2021-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:H PengFull Text:PDF
GTID:2518306107998379Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology,data storage and data collection capabilities,a large amount of data is generated every day,and it is extremely difficult to extract valuable information from these large amounts of data.In the era of big data,clustering algorithms have become an indispensable and important means to obtain useful information.No matter what kind of clustering algorithm has its advantages and disadvantages,no single clustering algorithm can handle all types of data structures and cluster shapes.For a given data set,if there is no prior knowledge,it will be difficult to choose which clustering algorithm to complete the given clustering goal.The proposed integrated clustering method provides a better way for data mining and can overcome some of the deficiencies in clustering algorithms.Integrated clustering is mainly combined with the idea of ensemble learning.First,a clustering algorithm is run to obtain multiple base clusters,and then integrated clustering results are obtained through consistent integration.In this paper,the principles and methods of the ensemble clustering algorithm are studied in depth.At present,many domestic and foreign scholars have proposed different integrated clustering methods.Most of the ensemble clustering methods are based on graph partitioning,but almost all of the ensemble clustering methods based on graph partitioning,the result of ensemble clustering is not the final clustering result,and the clustering algorithm needs to be used to obtain the final clustering.Similar results will make the solution change from discrete-continuous-discrete in the whole process.This will make the final clustering result deviate greatly from the real result.Moreover,most methods ignore the quality of the base clustering.If the quality of the base clustering obtained is very poor,the accuracy of the final clustering result will be reduced to a certain extent.This article has conducted in-depth research on the problems described above.The main contents include:(1)The overall process of the ensemble clustering algorithm framework is introduced in detail,and the whole process is summarized into two stages of base cluster generation and consistency integration.Some typical algorithms that are currently in this stage are summarized and analyzed.Popular cluster evaluation indicators are introduced.(2)A clustering algorithm based on bilateral integration is proposed.Perform multiple k-means algorithms on a given data set multiple times to produce multiple base clustering results;by building the base clustering results and samples into a bipartite graph,and clustering the base clusters and samples at the same time,the final result is directly obtained.Clustering results.Through experiments,the performance of the proposed algorithm is compared with other integrated clustering algorithms on real data sets.(3)A bilateral clustering ensemble algorithm based on spectral clustering is proposed.The algorithm generates multiple base clustering results by repeatedly performing the spectral clustering algorithm on a given dataset,then selects the base clustering results through standard mutual information,and finally,the base clustering result and the samples are clustered simultaneously to obtain the final clustering result.Get the final clustering result.The performance of the UCI real data set is compared with the traditional cluster integration algorithm through experiments.The experimental results show that the proposed algorithm is effective.For most existing graph-based ensemble clustering algorithms,only the potential information between samples and samples or base clusters and base clusters is considered during the consistency integration phase,and the potential information between samples and base clusters is ignored.The final clustering results cannot be obtained directly,and the influence of the quality of the base clustering on the ensemble clustering results is ignored.This paper proposes two integrated clustering algorithms.The proposed new method further improves the cluster analysis technique,and the effectiveness of the proposed algorithm is verified through a series of experiments.
Keywords/Search Tags:base clustering, clustering, spectral clustering, ensemble clustering
PDF Full Text Request
Related items