Font Size: a A A

Research On Multi-view Multiple Clustering Algorithm Based On Sampling And Inverse Optimization

Posted on:2024-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:J H CuiFull Text:PDF
GTID:2568307106489834Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In order to mine rich underlying information in multi-view data,multi-view clustering algorithm integrating complementary information from multiple heterogeneous views has become a research hotspot in the fields of machine learning and data mining,among which,clustering algorithm based on subspace has achieved great success.As an extension of multi-view clustering algorithm,multi-view multiple clustering algorithm can generate multiple clustering results for multi-view data at the same time,so as to better mine the hidden information in multi-view data.Therefore,it is of practical significance to apply subspace representation to the research of multi-view multiple clustering algorithm.However,in the era of big data,multi-view multiple clustering algorithm still faces challenges in terms of execution efficiency and effect:(1)Common subspace clustering includes two steps:finding sparse representation or low-rank structure of original multi-view data and performing spectral clustering on the new representation.Both of them require the quadratic or even the cubic time complexity of the number of original sample points.If multiple clustering results are to be generated,the time complexity will be further expanded.This defect makes it difficult to apply multi-view multiple clustering algorithm based on subspace to large-scale data.(2)Common subspace clustering based on anchor points is proposed to solve the high complexity of traditional subspace methods and achieve better improvement results.However,the separation of sampling and clustering leads to weak representative anchor points.Since the construction of the graph is based on the independent construction of the selected anchor points from the corresponding view data,the complementary information of the multi-view data is not fully mined.These problems seriously affect the quality of clustering results.(3)The commonly used criteria to measure the results of clustering are quality and diversity,and the multi-view multiple clustering algorithm is inherently faced with the difficulty of balancing the two.However,there are few researches on multi-view multiple clustering algorithm at present,and most of the existing methods focus more on the diversity of clustering results and ignore the quality of clustering results,which is more like the clustering algorithm of single-view data.Therefore,based on the above problems,this thesis studies the multi-view multiple clustering algorithm,and the main work is as follows:1.Large-scale multi-view multiple clustering algorithm is proposed based on the research on the multi-view multiple clustering of large-scale data.Specifically,select a small number of key sample points far less than the original sample points for each data view,and learn the relationship diagram of each data view between these small number of sample points with neighborhood structure and the original sample points through a convex quadratic programming problem.In the integration process of the graph,the weight allocation of each small graph is adaptive to approximate the different importance of the hidden information of each view in the original data,and the underlying structure of the large-scale original data is restored,so that the quality of the clustering results can be guaranteed.Finally,the clustering structure of the integrated graph in each subspace is explored to obtain multiple clustering results.With the help of the idea of finding orthographic subspace,various cases of redundancy in clustering are considered comprehensively to ensure the diversity of clustering results.This is the first time to study the multi-view multiple clustering algorithm for large-scale data clustering,and to achieve linear time complexity clustering results on multi-view data.While ensuring the quality and diversity of clustering results,the clustering efficiency is improved by at least 50%.2.In order to better explore the clustering structure of multi-view data and improve the quality of clustering results,this thesis proposes unified optimized multi-view multiple clustering algorithm.Specifically,this thesis unifies the sampling of a small number of key sample points,the construction of consensus graph and the depth matrix decomposition of graph into an integration framework,and represents the joint optimization of sample points,consensus graph and the deep decomposition of the neighborhood structure.This allows the few sample points learned to be closer to the underlying distribution structure of the original data,thus better obtaining the consensus map.The depth matrix decomposition can flexibly assign weight to each subspace,and its optimal solution ensures the complementarity of the clustering structure.In addition,the idea of sampling solves the problem of clustering large-scale data remarkably.The experimental results show that this algorithm significantly improves the quality of clustering results.On the premise of not affecting the diversity of clustering results on the multi-view data set,the index value to measure the quality of clustering results is more than double that of the existing multi-view multiple clustering algorithm.3.In order to better balance the quality and diversity of clustering results,this thesis proposes multi-view multiple clustering algorithm with inverse optimization.Specifically,look for both a potential representation of multi-view data sharing and a subspace representation specific to each view.The consistency and diversity of multi-view data are mined respectively,and the former is approximated to the latter infinitely by a reverse encoder network.The consensus representation of multi-view data is restored by optimizing iteration and then its underlying subspace structure is explored.In order to reduce the time complexity of the algorithm,the sampling idea is introduced when exploring the specific subspace representation of each view.The experimental results show that the clustering effect of this algorithm has the characteristics of both quality and diversity.
Keywords/Search Tags:Multi-view multiple clusterings, Sampling, Large-scale data, Data mining, Machine learning
PDF Full Text Request
Related items