Research On Structure Based Multi-modal Data Analysis

Posted on:2022-11-02

Degree:Master

Type:Thesis

Country:China

Candidate:T Y Liang

Full Text:PDF

GTID:2518306782452524

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

The rise of the mobile Internet and self-media has made the multimedia data increase rapidly.People are overwhelmed by the explosive growth of data in various forms.How to build an automated intelligent system for data content understanding and data relationship analyzing has become a critical problem,and the research on multimedia data analysis has therefore received a wide range of attention.As two fundamental tasks of multimedia data analysis,cross-modal retrieval and multi-view clustering aim to model the relatedness of the content between heterogeneous data and become the key technologies to support downstream applications.This thesis focuses on these two tasks,proposing solutions to relatedness modeling of multi-modal data,structure controlling in data hashing and the minding of structural information with incompletely observed multi-view data.The main contents of this thesis are summarized as follows.This thesis proposes a cross-class similarity aware cross-modal hashing method.Firstly,to make up for the defect of the similarity measurements used in existing methods on betweenclass relationship modeling,concept taxonomy is built for the class set of datasets,and a new flexible measurement is proposed based on the concept taxonomy to encode the relationship between classes.Secondly,to make full use of such relationship to guide the hash function learning and tackle the pair-similarity imbalance problem,a new objective is proposed for structure control in the Hamming space,which is capable of exploiting the whole available distance range.Also,it can tackle the imbalance problem by separating the optimizing procedure of high and low relevant data pairs.To validate the efficacy of the proposed method,extensive experiments and analyses are conducted on three widely used datasets under three hash code lengths with two retrieval tasks,comparing the proposed method and several baselines.The results demonstrate the superiority of the proposed approach.This thesis also proposes a generic augmenting method for incomplete multi-view clustering from a statistical perspective.To deal with deficiency of existing methods in exploiting the information of unpaired data,a new insight is first drawn from the data pattern,pointing out the potential correspondence across different views from the perspective of data distribution.Then,the cross-view consistency is quantified from a statistical perspective,which serves as a technical support for cross-view consistency exploration with the unpaired data.Based on that,together with the proposed intra-view smoothness assumption,a corresponding group constructing strategy is built,transforming the minded distributional correspondence into a generic objective.Two showcases of augmenting existing model with our proposal are provided.The results of experiments and analyses on three datasets show that the proposed method can bring improvement to the base models consistently and achieve more robust performance under high data missing rates.

Keywords/Search Tags:

Multi-modal, Multi-view, Computer Vision, Deep Learning

PDF Full Text Request

Related items

1	Multi-view Neural Network Learning Approaches For Cross-modal Retrieval And Classification
2	Research And System Implementation Of Multi-view Stereo Based On Dynamic Edge Flow And Tensor Acceleration
3	Research On Multi-modal Learning For Imbalanced Modal Data
4	Research On The Application Of Multi-instance Learning In Computer Vision
5	Research Of Human-computer Interaction Technology Based On Multi-modal Biopotentials
6	Research On Prediction And Decision-making Methods Based On Multi-source Information Fusion
7	Multi-modal Learning Based On Single-modal And Multi-modal Data
8	Research On Deep Learning Based Multi-view Representation Learning Techniques
9	Multi-stream And Multi-view Deep Learning For Surface Electromyography Based Gesture Recognition
10	Application For Homologous And Heterogeneous Multimodal Data Based On Multiple Deep Learning Blocks