Font Size: a A A

The Research And Application Of Multiple-exemplar Clustering

Posted on:2022-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:N N ZhangFull Text:PDF
GTID:2518306527483034Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering is the technology used to partition data into different groups without label in order to discover the natural structure of data.Clustering plays an important role in data analysis.It can be used in discovering the latent structure of data,grouping the data spontaneously and compressing the data.Clustering is one of the important branches of artificial intelligence study.In 2019,Niefeiping,et al proposed K-Multiple-Means: A Multiple-Means Clustering Method with Specified K Clusters(KMM)in ACM SIGKDD.KMM algorithm sets multiple sub-prototypes for each class,and formalized assigning data containing multiple sub-prototypes to specified K classes as an optimization problem which can be solved by updating the partitions of m sub-cluster means and k clusters alternately.KMM algorithm overcome the drawback of K-means algorithm in non-convex patterns.Comparing with algorithms in the same category,KMM algorithm is more effective and has better performance in clustering.As an extension of K-means clustering,KMM algorithm is sensitive to the selection of initial sub-prototypes,and its random selection of prototypes leads to unstable clustering results of the algorithm.In addition,It's a great pity that KMM algorithm can not be used in multi-view clustering with the increase in multi-view data,while it performs well in the single-view clustering task.The disadvantages mentioned above makes KMM has many inconvenience in practical application.Therefore,the improvement of this algorithm can not only enhance its practicability,but also broaden its application scenarios.In this paper,the main research and work on shortcomings of the KMM mentioned above includes the following aspects:(1)Aiming at the defect that KMM algorithm is sensitive to the initial prototypes and it's clustering results are unstable due to the random selection of prototypes,a stable K multiplemeans clustering is proposed.The new algorithm constructs the graph based on the nearest neighbor relation of samples,divides the data into several groups according to the connected components of the graph,takes the mean point of each group of data as the initial prototypes,and then get the clusters by taking an alternating optimization strategy to solve the optimization problem.The experiments on artificial data sets and UCI data sets show that the new algorithm can achieve better and stable clustering results.(2)The new algorithm which is an extension of KMM can be used in multi-view clustering is proposed.The new algorithm redesign the objective function by introducing the weight parameters of the views,and gets the best weight allocation of the view in the process of solving the objective function.On several multi-view datasets,the new algorithm performs better than many popular multi-view clustering algorithms(3)The FN-KMM algorithm proposed in this paper is applied to the analysis of lung Xray films,and the MKMM algorithm is applied to the classification of regional water resources.Firstly,the data about task was collected.Secondly,the collected dataset was preprocessed according to the algorithm and characteristics of examples,and then the corresponding algorithms were used to the processed dataset,three clustering measurements were used to evaluate the obtained clustering results.Finally,the two algorithms proposed in this paper having certain practical value was proved by comparing and analyzing the clustering effect of different algorithms on data sets.
Keywords/Search Tags:multiple-means clustering, clustering center, multi-view, X-ray films analysis, classification of water resources
PDF Full Text Request
Related items