Font Size: a A A

Research On Cross-modal Retrieval Algorithm With Semantic Shared Subspace

Posted on:2022-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:J DaiFull Text:PDF
GTID:2518306527984279Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In the era of mobile Internet,information data such as pictures,texts,audios and videos have exploded.How to obtain valuable information from cross-modal data which are semantically related is particularly important,so cross-modal retrieval has attracted much attention.Different from single-modal retrieval such as image retrieval,cross-modal retrieval can achieve information retrieval across different modalities,which can better meet the increasing retrieval needs of users.However,there is a natural heterogeneous gap between the feature representations of different modal data,which makes it impossible to measure their similarity directly.Moreover,there is a semantic gap between the low-level feature representations of the same modal data and their high-level semantic category,which makes it difficult to obtain a semantically consistent representations.Therefore,how to mine the matching information among multimodal data for bridging the heterogeneity gap as well as semantic gap and improving the performance of cross-modal retrieval is an urgent problem to be solved now.This thesis mainly studies cross-modal retrieval based on semantic shared common subspace in which different modal data are mapped,and a simple distance function can be used to measure their similarity in the space.Faced with the heterogeneous gap and semantic gap among multimodal data,this thesis conducts in-depth research and analysis on how to make full use of semantic consistency,recognition difference and local consistency among data.The main research results are as follows:(1)Aiming at the problem of insufficient consideration of the semantic consistency of retrieval-modal data in the current task-oriented cross-modal retrieval algorithms,a task-oriented cross-modal retrieval based on jointing linear discrimination and graph regularization is proposed.The approach constructed different mapping mechanisms for retrieval tasks in a joint learning framework,and mapped multi-modal data into common subspaces for similarity measuring.During the learning process,correlation analysis and single-modal semantic regression were combined to preserve the correlation between paired data and enhance the semantic accuracy of query-modal data.Simultaneously,linear discrimination analysis was utilized to ensure semantic consistency of retrieval-modal samples.The approach also constructed local neighbor graphs for multi-modal data to preserve structural information,which can improve the retrieval performance.Experiments results on two cross-modal datasets,namely Wikipedia and Pascal Sentence showed that the average m AP value on different retrieval tasks of the proposed method had respectively increased by 1.0%?16.0% and 1.2%?14.0% compared with the twelve existing methods.(2)Aiming at the problem of insufficient consideration of the differences between semantic recognition of different modal samples in current cross-modal retrieval algorithms based on deep learning,a deep cross-modal retrieval method with recognition transfer is proposed.The approach minimized the semantic loss between the text semantic labels and the text common representations for fully maintaining the semantic consistency of the text common representations,and minimized the reconstruction loss between the text decoding vector and the text original feature for transferring the high semantic recognizability of text to common space.Simultaneously,the approach eliminated the modal differences between common representations of multi-modal data and maintained the inter-modality pairwise correlation and intra-modality local consistency by minimizing the modality invariance loss and the sample correlation loss in common space.Experiments results on two cross-modal datasets,namely Wikipedia and Pascal Sentence showed that the average m AP value on different retrieval tasks of the proposed method had respectively increased by 0.7%?40.6%and 3.0%?54.0% compared with the twelve existing methods.(3)Aiming at the problem of insufficient mining of intra-modality local consistency in current cross-modal retrieval based on common space,a cross-modal retrieval method with graph convolution fusion is proposed.The approach used the nearest neighbor algotithm to construct respective modal-graphs for different modal data,and respectively encoded the original features of different modal samples by symmetrical graph convolutional sub-networks and symmetrical multi fully-connected layers.Furthermore,it fused the graph convolutional coding feature and the fully-connected coding feature of modalties into common representation learing layer.Then in this layer,it jointly optimized modal invariance loss between modalities and intra-modal sematic label loss to gain highly locally consistent and semantically consistent common representations.Experiments results on two cross-modal datasets,namely Wikipedia and Pascal Sentence showed that the average m AP value on different retrieval tasks of the proposed method had respectively increased by 2.3%?42.2%and 2.4%?53.4% compared with the twelve existing methods.
Keywords/Search Tags:cross-modal retrieval, common space, linear discriminant analysis, recognition transfer, graph convolutional encoding
PDF Full Text Request
Related items