Research On Multi-Modal Clustering For Incomplete Data

Posted on:2024-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:D J Guo

Full Text:PDF

GTID:2568307172469814

Subject:Computer application technology

Abstract/Summary:

Clustering is a fundamental research topic in machine learning and data science that has practical applications.Multimodal clustering aims to integrate information from different modalities to find correct clusters.Image-text clustering has received widespread attention due to the prevalence of visual media and natural language.However,current research overlooks the impact of missing data on feature learning and clustering in image-text modalities,as well as significant differences between heterogeneous feature domains.The lack of effective image feature extraction methods results in a large semantic gap between images and texts in image-text clustering.Furthermore,traditional shallow models or linear embedding methods are inadequate to handle complex data clustering in the real world,and existing methods rarely consider the case where each instance contains only one modality and lacks high-level semantic information mining for multimodal data.This paper aims to address the challenges in incomplete image-text multimodal clustering tasks and develop matching models and loss functions to create effective clustering algorithms.Specifically,the paper proposes the following solutions:In response to the issue that existing methods for extracting image features cannot capture the global semantic information of image instances,this paper proposes a method called Graph-based Inference for Image Feature Extraction(GIFIE).Inspired by human perception mechanisms,this method extracts the global representation of visual instances,thereby bridging the semantic gap between image and text features.To address the extreme case where each instance only contains one modality,this paper proposes an Adversarial Learning-based Modality Pairing Algorithm(ALMP).This algorithm transfers missing image-text features to a unified latent space to achieve alignment between different modalities.According to the experimental findings,this approach demonstrates noteworthy enhancements in accuracy,NMI,F-measure,and purity metrics when compared to the existing methodologies,with improvements of 2%,2.87%,0.21%,and 3.51% respectively.To tackle the issue of mining high-level semantic information from multimodal data,this paper proposes a Multi-modal Contrastive Learning Method(MCLM).This method aims to obtain common high-level semantic features and consistent clustering assignments for all modalities.The experimental results reveal that this approach has exhibited significant enhancements in various performance metrics compared to existing methods.Specifically,the accuracy,adjusted Rand index,normalized mutual information,F-measure,and purity metrics have improved by 6.17%,2.1%,1.1%,3%,and 5.42% respectively.

Keywords/Search Tags:

Multi-modal clustering, image feature extraction, incomplete multi-modal clustering, adversarial learning, clustering performance

Related items

1	Incomplete Cross-modal Clustering Analysis
2	Multi-modal Clustering Analysis Based On Deep Learning
3	Image annotation and retrieval based on multi-modal feature clustering and similarity propagation
4	Research On Deep Multi-modal Clustering Algorithms
5	The Research On The Method For Webvideos Clustering Based On Multi-Modal Strategy
6	Research On Multi-modal Learning Based On Shared Subspace
7	The Research On Several Issues Of Clustering And Clustering Validity Indexes
8	Incomplete Multi-view Data Clustering Analysis
9	Research And Implementation Of Incomplete Multi-View Clustering Algorithm
10	Research On Incomplete Multi-view Clustering Methods