Font Size: a A A

Research And Application On Metric Learning Based On Attention Network

Posted on:2019-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2428330590492340Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,multimedia data has become the most important source of information in people's lives.How to retrieve multimedia data has become a very important research direction.The traditional retrieval method is mainly based on key words matching between the man-made tags of media data and the user input.But this method is rather rough.And in many cases,the result expected by the user can not be accurately given.Artificially adding accurate tags to all media data has also become extremely costly and difficult to implement because of the explosive growth of the Internet media data.Under this background,content-based multimedia data retrieval has become increasingly important.Through this,users can retrieve multimedia data conveniently by inputting digital media data or language description.The problem we want to solve in this paper is to use metric learning neural network as a embedding function which can map multimedia data to a feature space,in which similar samples are close to each other.Existing content-based deep learning retrieval models mostly extract the features based on the entire input sample.But generally,some parts of the input sample may not be conducive to retrieval,such as the background and occlusion.How to identify and extract the important part of the input sample from the noise region is a challenging problem for all content-based retrieval systems.In this paper,we use the attention model to improve the performance of feature extraction and solve this problem.Specifically,we construct two content retrieval systems based on deep neural network using metric learning: A clothing retrieval model based on attention network,which can do the same domain and cross-domain retrieval of clothing images well;A cross-modal retrieval system based on generalized attention mainly solves the problem of cross-modal retrieval between image and text.In order to help users retrieve desired clothes through inputting images,we build a highprecision clothing retrieval system based on attention network.The system uses a self-learning visual attention model to extract the attention maps from the clothing images and combines them with the intermediate feature maps to decrease the influence of noise and background,and strength the feature vector.We propose an Impdrop connection method to connect the attention model to the main network to form an end-to-end network.The Impdrop connection introduces randomness into the attention model during the training to make the system more robust.Multiple sets of experiments on different datasets demonstrate the effectiveness of our method.In order to show the actual effect of our clothing retrieval system,we set up a clothing image retrieval system on the server,which can be accessed through the url url http://202.120.39.165:9998.With people's growing demand of cross-media retrieval,we propose a cross-modal retrieval model based on generalized attention for cross-modal retrieval between images and Chinese texts.Images and texts belong to different modalities.In order to solve the ”gap” between different media modalities,we design different mapping networks for images and Chinese texts.Different networks map different media modality to a sharing metric space,and retrieve according to the distances between samples in space.For cross-modal retrieval task,we also design a generalized attention model based on long and short memory networks,which can automatically detect important regions from input samples and improve the performance of mapping function.Finally,we implement the system and verify its effectiveness through experiments.
Keywords/Search Tags:Attention model, Deep neural network, Multimedia retrieval, Metric learning, Cross-modal
PDF Full Text Request
Related items