Research On Fusion Of Multi-level Image Features For Image-text Matching

Posted on:2021-07-10

Degree:Master

Type:Thesis

Country:China

Candidate:J F Li

Full Text:PDF

GTID:2518306470463134

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Image-text matching technology allows computers to try to understand the two modal information of image and text,and then match them based on the semantic content of these two.In the Internet age,which is full of various images and texts,image-text matching has great practical value.To achieve image-text matching,we need to characterize the image first,and the pre-training method is an efficient image characterization method in deep learning.Therefore,the pre-training method is generally used to characterize the image in image-text matching.However,there are the following problems in the existing algorithms of image-text matching based on the deep learning pre-training method:(1)The actual image-text matching task and the pre-training task are different,which will bring negative effects such as noise features;(2)Only the single-level image features information of the pre-trained image features is used,and the pre-trained features cannot be fully and reasonably used.On the premise of the above,this paper takes the image-text matching data as the research object,and studies how to use the pre-trained image features more reasonably in the image-text matching task.The main results and contributions of this paper are as follows:(1)This paper proposes a multi-level image features fusion algorithm(Fusion of Multi-level Image Features,FMLIF)for image-text matching.Under the guidance of the training target of the image-text matching task,the multi-level pre-trained image features are reduced in dimensionality and fused through a multi-layer perceptron(MLP).This process can generate the fused image features that are better characterization in the image-text matching algorithm.The negative effects of inconsistent tasks can be alleviated to a certain extent in this process.At the same time,the extraction and utilization of the multi-level image features in the pre-trained network can make full use of more pre-trained features and mine more use value of pre-trained features.Finally,we conduct the comparative experiments in the Sohu image-text matching competition data set and the Flickr30 K data set.The experimental results prove that compared with the unprocessed single-level and multi-level pre-trained features,the fused image features generated by the FMLIF algorithm in this paper can obtain better performance in image-text matching.At the same time,through the relevant experiments we also prove that compared with single-level features the use of multi-level features can make the process of dimensionality reduction and fusion perform better.(2)In order to improve the performance of dimensionality reduction and fusion for multi-level pre-trained image features in FMLIF algorithm,further research is carried out in this paper.Inspired by natural language processing and multi-step inference,sequence modeling technique is applied to treatment for multi-level pre-trained features in this paper.Therefore,we propose the RNN-based FMLIF algorithm,which uses the recurrent neural network(RNN)structure to fuse the multi-level pre-trained features and reduce its dimensionality.Finally,we conduct the comparative experiments in the Flickr30 K data set.The experimental results show that the using of RNN is better than that of MLP in the effect of dimensionality reduction and fusion for multi-level pre-trained features.The fused image features generated by RNN have more strong characterization ability in the image-text matching algorithm,and can get better matching performance.

Keywords/Search Tags:

image-text matching, pre-trained image features, convolutional neural network, multi-layer perceptron, recurrent neural network

PDF Full Text Request

Related items

1	Research Of Image Denoising Recurrent Neural Network Based On Multi-layer Convolutional Sparse Coding
2	Short Text Classification Based On Multi-granularity Feature Representation And Recurrent Convolutional Neural Network
3	Research On Image Description Method Based On Multimodal Recurrent Neural Networks
4	Image Matching Via Deep Recurrent Neural Network
5	Research On Key Technologies Of High Performance Accelerator For Convolution And Recurrent Neural Networks
6	Research Of Image Restoration Neural Network Based On Multi-layer Convolutional Sparse Coding
7	The Research On Multi-product Image Classification Based On Convolutional Neural Network
8	Optical Music Recognition Algorithm Combining Multi-scale Residual Convolutional Neural Network And Simple Recurrent Units
9	Convolutional Recurrent Network For Offline Handwritten Text And Scene Text Recognition
10	Research On Image And Text Matching Method Based On Deep Leaming