Font Size: a A A

Research On Multimodal Retrieval Based On Deep Learning

Posted on:2020-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:J J TangFull Text:PDF
GTID:2438330611982450Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the information age,with the development of Internet technology and the acceleration of information transmission rate,massive data are waiting for mining,and the types of data are increasing.Therefore,in this era of big data,it is a problem to retrieve other kinds of modal data through certain modal data in mass data.Cross-modal retrieval technology based on deep learning is the most advanced technology at present,but the current cross-modal retrieval technology has a long training time,and the optimization ability and convergence ability cannot be considered.This thesis is Based on MS-COCO dataset and Flickr30 K dataset,through the use of the deep convolution Neural Network and deep Recurrent Neural Network respectively to abstract image data and text data,directly by the high-level semantic features of modal data,optimize the get the high-level semantic feature of embedded,get a Euclidean space.The spatial distance corresponds to the similarity between the multi-modal data,and the matching/mismatching method and back propagation method generated by the triplet loss mining method make the corresponding multimodal data more compact in the learned space.The main research contents of this thesis include the following aspects:(1)This thesis proposes a combinatorial model based on periodic weight change.In the triplet loss mining method based on matched pair and non-matching pair,the periodic weight change is introduced to obtain the combined model of the small-sample heterogeneous sample pair and the shortest-distance heterogeneous sample pair,because the weight is reasonably bounded.The cyclic transformation between values allows the model to be closer to the local minimum without increasing the extra training cost,thus the accuracy is improved and the sensitive noise is slowed down.(2)This thesis proposes a new network training method based on clustering.According to the characteristics that similar data nodes gather in the embedded space and heterogeneous data nodes stay away from each other in the embedded space.This thesis proposed the priority training method of single branch network.After the training of the single branch network is completed,the parameters of the branch network are fixed.The performance of the model can be improved by training another branch network with the bi-directional retrieval target loss function.Because the noise of training data is reduced in advance,the training time is shorter and the training effect is more stable.Experiments on the above research content show that the combination model based on periodic weight change and the branch network priority training method based on text clustering both have better performance than the current advanced VSE++ and Embedding Net algorithms.
Keywords/Search Tags:Multimodal data, Periodic weight, Gather, Triplet loss, Local minimum, Embedding
PDF Full Text Request
Related items