Font Size: a A A

Research On Structure Based Multi-modal Data Analysis

Posted on:2022-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:T Y LiangFull Text:PDF
GTID:2518306782452524Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The rise of the mobile Internet and self-media has made the multimedia data increase rapidly.People are overwhelmed by the explosive growth of data in various forms.How to build an automated intelligent system for data content understanding and data relationship analyzing has become a critical problem,and the research on multimedia data analysis has therefore received a wide range of attention.As two fundamental tasks of multimedia data analysis,cross-modal retrieval and multi-view clustering aim to model the relatedness of the content between heterogeneous data and become the key technologies to support downstream applications.This thesis focuses on these two tasks,proposing solutions to relatedness modeling of multi-modal data,structure controlling in data hashing and the minding of structural information with incompletely observed multi-view data.The main contents of this thesis are summarized as follows.This thesis proposes a cross-class similarity aware cross-modal hashing method.Firstly,to make up for the defect of the similarity measurements used in existing methods on betweenclass relationship modeling,concept taxonomy is built for the class set of datasets,and a new flexible measurement is proposed based on the concept taxonomy to encode the relationship between classes.Secondly,to make full use of such relationship to guide the hash function learning and tackle the pair-similarity imbalance problem,a new objective is proposed for structure control in the Hamming space,which is capable of exploiting the whole available distance range.Also,it can tackle the imbalance problem by separating the optimizing procedure of high and low relevant data pairs.To validate the efficacy of the proposed method,extensive experiments and analyses are conducted on three widely used datasets under three hash code lengths with two retrieval tasks,comparing the proposed method and several baselines.The results demonstrate the superiority of the proposed approach.This thesis also proposes a generic augmenting method for incomplete multi-view clustering from a statistical perspective.To deal with deficiency of existing methods in exploiting the information of unpaired data,a new insight is first drawn from the data pattern,pointing out the potential correspondence across different views from the perspective of data distribution.Then,the cross-view consistency is quantified from a statistical perspective,which serves as a technical support for cross-view consistency exploration with the unpaired data.Based on that,together with the proposed intra-view smoothness assumption,a corresponding group constructing strategy is built,transforming the minded distributional correspondence into a generic objective.Two showcases of augmenting existing model with our proposal are provided.The results of experiments and analyses on three datasets show that the proposed method can bring improvement to the base models consistently and achieve more robust performance under high data missing rates.
Keywords/Search Tags:Multi-modal, Multi-view, Computer Vision, Deep Learning
PDF Full Text Request
Related items