Research On Multimodal Retrieval Based On Deep Learning

Posted on:2020-12-14

Degree:Master

Type:Thesis

Country:China

Candidate:J J Tang

Full Text:PDF

GTID:2438330611982450

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the information age,with the development of Internet technology and the acceleration of information transmission rate,massive data are waiting for mining,and the types of data are increasing.Therefore,in this era of big data,it is a problem to retrieve other kinds of modal data through certain modal data in mass data.Cross-modal retrieval technology based on deep learning is the most advanced technology at present,but the current cross-modal retrieval technology has a long training time,and the optimization ability and convergence ability cannot be considered.This thesis is Based on MS-COCO dataset and Flickr30 K dataset,through the use of the deep convolution Neural Network and deep Recurrent Neural Network respectively to abstract image data and text data,directly by the high-level semantic features of modal data,optimize the get the high-level semantic feature of embedded,get a Euclidean space.The spatial distance corresponds to the similarity between the multi-modal data,and the matching/mismatching method and back propagation method generated by the triplet loss mining method make the corresponding multimodal data more compact in the learned space.The main research contents of this thesis include the following aspects:(1)This thesis proposes a combinatorial model based on periodic weight change.In the triplet loss mining method based on matched pair and non-matching pair,the periodic weight change is introduced to obtain the combined model of the small-sample heterogeneous sample pair and the shortest-distance heterogeneous sample pair,because the weight is reasonably bounded.The cyclic transformation between values allows the model to be closer to the local minimum without increasing the extra training cost,thus the accuracy is improved and the sensitive noise is slowed down.(2)This thesis proposes a new network training method based on clustering.According to the characteristics that similar data nodes gather in the embedded space and heterogeneous data nodes stay away from each other in the embedded space.This thesis proposed the priority training method of single branch network.After the training of the single branch network is completed,the parameters of the branch network are fixed.The performance of the model can be improved by training another branch network with the bi-directional retrieval target loss function.Because the noise of training data is reduced in advance,the training time is shorter and the training effect is more stable.Experiments on the above research content show that the combination model based on periodic weight change and the branch network priority training method based on text clustering both have better performance than the current advanced VSE++ and Embedding Net algorithms.

Keywords/Search Tags:

Multimodal data, Periodic weight, Gather, Triplet loss, Local minimum, Embedding

PDF Full Text Request

Related items

1	Triplet Loss And Manifold Dimensionality Reduction Based Method For Text-independent Speaker Recognition
2	Research On Local Linear Embedding Algorithm And Its Application
3	Speaker Recognition Algorithm Based On Deep Learning
4	Research On Voiceprint Recognition Model Based On End-to-end Neural Network
5	The Stability Analysis For A Class Of Switched System
6	Research On Face Recognition Based On Machine Learning Method
7	Research On Cross-lingual Word Embedding Construction Methods Based On Deep Semantics
8	Computer-aided design and manufacture of minimum-weight structures
9	Multimodal Cycle-consistent Zero-Shot Learning Based On Unbiased Embedding
10	Mining Local Periodic Patterns In A Discrete Sequence