Font Size: a A A

Deep Network For Image-Text Cross-Modal Retrieval

Posted on:2020-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:H Y PengFull Text:PDF
GTID:2428330596964240Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Image-text retrieval has gained much attention recently.However,features from image and text have large gap and query is time-consuming.To reduce the gap of features from image and text,we propose a image-text attention block to learn the cross-modal relationship via an elaborately-designed attention mechanism.Image-text hashing has received intensive attention due to its low computation and storage efficiency in image-text retrieval task.Most previous cross-modal hashing methods mainly focus on extracting correlated binary codes from the pairwise label,but largely ignore the semantic categories of cross-modal data.We propose to embed category information into hash codes.More specifically,we introduce semantic prediction loss into our framework to enhance hash codes with category supervision.leading cross-modal hashing to link irrelevant features for retrieval task.Our cross-modal network applies cross-modal attention block to efficiently encode rich and relevant features to learn compact hash codes.Extensive experiments on three challenging benchmarks demonstrate that our proposed method significantly improves the retrieval results.On IAPR TC-12,our method outperforms the state-of-art by a large margin,7.2% increase in MAP.To improve the efficiency and reduce the computational requirements of the inference of deep networks,we propose to use CCP channel pruning to compress networks.Our results on IAPR TC-12 demonstrates that the parameter of AlexNet can be reduced 20 X without effects on performance.
Keywords/Search Tags:Hash code, cross-modal, retrieval, deep learning, feature extraction
PDF Full Text Request
Related items