Font Size: a A A

Multi-modal Clustering Analysis Based On Deep Learning

Posted on:2021-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiFull Text:PDF
GTID:2518306047487704Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Multi-modal data is obtained from multiple sources or subsets of features.For example,a person's identity can be identified by the information obtained from multiple sources such as the face,fingerprints,handwriting,or iris,and an image can be represented by its color or texture characteristics.With the advent of the era of big data,it is very difficult to annotate all the data,and clustering algorithms can automatically group samples based on the similarity relationship between samples.Therefore,in recent years,clustering algorithms for multi-modal data have gained more attention.The key to multi-modal clustering is to explore the common information among data.Traditional multi-modal clustering algorithms can only extract the shallow features of the samples,and cannot effectively dig deeper non-linear features hidden of data.At the same time,by simulating the cognitive process of the human brain,deep learning can perform a good non-linear transformation of features,thereby extracting deeper features of multi-modal data.Based on this,the deep multi-modal clustering algorithms is studied in this paper.The main contents are as follows:1.The existing deep multi-modal clustering methods based on auto-encoders cannot guar-antee the consistent relationship between the main information of reconstructed samples and the original samples,resulting in the inability to effectively extract the multi-modal potential descriptions for clustering.In response to this problem,this paper proposes a novel Deep Adversarial Multi-modal Clustering Network(DAMC),so that the potential descriptions learned by each modal can generate original samples with each other without losing the main information to obtain a more consistent clustering structure.Specifically,a multi-modal en-coder is used to extract potential descriptions from each modal,and then a multi-modal generator is used to generate reconstructed samples corresponding to all the modals,which ensures the specificity and consistency of each modal potential description.The discrimi-native network and mean square error are used to generate samples without losing the main information of the original samples,which ensures the validity of the extracted potential descriptions.In addition,the weighted adaptive learning is used to obtain shared potential descriptions,and the clustering network is embedded to further improve clustering perfor-mance.At the same time,1,2norm make the shared potential descriptions distinguishable.Experimental results on video,image,and text datasets show that the method is superior to other multi-modal clustering methods.2.Existing deep multi-modal subspace clustering algorithms do not consider the geometric distribution relationship of the inter-modal data and the intra-modal data simultaneously,re-sulting in poor clustering effect.To solve this problem,a Adversarial t-SNE for Multi-modal Subspace Clustering(At SNE)algorithm is proposed.The algorithm uses an adversarial t-SNE network to keep the distribution of potential descriptions of each modal learned by the encoder consistent with the distribution of potential descriptions shared by all modals,and uses a self-expression layer to learn a consistent clustering structure.The clustering structure contains the geometric distribution relationship between the inter-modal data and the intra-modal data.Finally,a multi-modal convolutional decoder is used to reconstruct the data to ensure that the encoded features can retain the information of the original data.The experimental results on four multi-modal image data sets show that this algorithm has advantages.
Keywords/Search Tags:Deep Learning, Adversarial Learning, Auto-encoder, Multi-modal Clustering
PDF Full Text Request
Related items