Font Size: a A A

Research On Fusion Of Multi-level Image Features For Image-text Matching

Posted on:2021-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:J F LiFull Text:PDF
GTID:2518306470463134Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image-text matching technology allows computers to try to understand the two modal information of image and text,and then match them based on the semantic content of these two.In the Internet age,which is full of various images and texts,image-text matching has great practical value.To achieve image-text matching,we need to characterize the image first,and the pre-training method is an efficient image characterization method in deep learning.Therefore,the pre-training method is generally used to characterize the image in image-text matching.However,there are the following problems in the existing algorithms of image-text matching based on the deep learning pre-training method:(1)The actual image-text matching task and the pre-training task are different,which will bring negative effects such as noise features;(2)Only the single-level image features information of the pre-trained image features is used,and the pre-trained features cannot be fully and reasonably used.On the premise of the above,this paper takes the image-text matching data as the research object,and studies how to use the pre-trained image features more reasonably in the image-text matching task.The main results and contributions of this paper are as follows:(1)This paper proposes a multi-level image features fusion algorithm(Fusion of Multi-level Image Features,FMLIF)for image-text matching.Under the guidance of the training target of the image-text matching task,the multi-level pre-trained image features are reduced in dimensionality and fused through a multi-layer perceptron(MLP).This process can generate the fused image features that are better characterization in the image-text matching algorithm.The negative effects of inconsistent tasks can be alleviated to a certain extent in this process.At the same time,the extraction and utilization of the multi-level image features in the pre-trained network can make full use of more pre-trained features and mine more use value of pre-trained features.Finally,we conduct the comparative experiments in the Sohu image-text matching competition data set and the Flickr30 K data set.The experimental results prove that compared with the unprocessed single-level and multi-level pre-trained features,the fused image features generated by the FMLIF algorithm in this paper can obtain better performance in image-text matching.At the same time,through the relevant experiments we also prove that compared with single-level features the use of multi-level features can make the process of dimensionality reduction and fusion perform better.(2)In order to improve the performance of dimensionality reduction and fusion for multi-level pre-trained image features in FMLIF algorithm,further research is carried out in this paper.Inspired by natural language processing and multi-step inference,sequence modeling technique is applied to treatment for multi-level pre-trained features in this paper.Therefore,we propose the RNN-based FMLIF algorithm,which uses the recurrent neural network(RNN)structure to fuse the multi-level pre-trained features and reduce its dimensionality.Finally,we conduct the comparative experiments in the Flickr30 K data set.The experimental results show that the using of RNN is better than that of MLP in the effect of dimensionality reduction and fusion for multi-level pre-trained features.The fused image features generated by RNN have more strong characterization ability in the image-text matching algorithm,and can get better matching performance.
Keywords/Search Tags:image-text matching, pre-trained image features, convolutional neural network, multi-layer perceptron, recurrent neural network
PDF Full Text Request
Related items