Font Size: a A A

Research On Multi-Modal Clustering For Incomplete Data

Posted on:2024-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:D J GuoFull Text:PDF
GTID:2568307172469814Subject:Computer application technology
Abstract/Summary:
Clustering is a fundamental research topic in machine learning and data science that has practical applications.Multimodal clustering aims to integrate information from different modalities to find correct clusters.Image-text clustering has received widespread attention due to the prevalence of visual media and natural language.However,current research overlooks the impact of missing data on feature learning and clustering in image-text modalities,as well as significant differences between heterogeneous feature domains.The lack of effective image feature extraction methods results in a large semantic gap between images and texts in image-text clustering.Furthermore,traditional shallow models or linear embedding methods are inadequate to handle complex data clustering in the real world,and existing methods rarely consider the case where each instance contains only one modality and lacks high-level semantic information mining for multimodal data.This paper aims to address the challenges in incomplete image-text multimodal clustering tasks and develop matching models and loss functions to create effective clustering algorithms.Specifically,the paper proposes the following solutions:In response to the issue that existing methods for extracting image features cannot capture the global semantic information of image instances,this paper proposes a method called Graph-based Inference for Image Feature Extraction(GIFIE).Inspired by human perception mechanisms,this method extracts the global representation of visual instances,thereby bridging the semantic gap between image and text features.To address the extreme case where each instance only contains one modality,this paper proposes an Adversarial Learning-based Modality Pairing Algorithm(ALMP).This algorithm transfers missing image-text features to a unified latent space to achieve alignment between different modalities.According to the experimental findings,this approach demonstrates noteworthy enhancements in accuracy,NMI,F-measure,and purity metrics when compared to the existing methodologies,with improvements of 2%,2.87%,0.21%,and 3.51% respectively.To tackle the issue of mining high-level semantic information from multimodal data,this paper proposes a Multi-modal Contrastive Learning Method(MCLM).This method aims to obtain common high-level semantic features and consistent clustering assignments for all modalities.The experimental results reveal that this approach has exhibited significant enhancements in various performance metrics compared to existing methods.Specifically,the accuracy,adjusted Rand index,normalized mutual information,F-measure,and purity metrics have improved by 6.17%,2.1%,1.1%,3%,and 5.42% respectively.
Keywords/Search Tags:Multi-modal clustering, image feature extraction, incomplete multi-modal clustering, adversarial learning, clustering performance
Related items