Font Size: a A A

Research On Image Text Matching Algorithm

Posted on:2022-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:D P WangFull Text:PDF
GTID:2518306542955409Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the popularization and development of Internet and computer technology,human society has stored a large number of data in various modes,of which image data and text data occupy a large part.How to correlate these different modal data to mine more hidden information in the data has become a major research hotspot.Image text matching aims to discover the relationship between image and text data and build a bridge between them.Through this bridge,two-way search of text and image data can be realized.The human brain can comprehensively process multimodal data,but the existing artificial intelligence system is not good at processing multimodal data.With the development of artificial intelligence,multimodal related technologies must be developed.Image text matching technology has far-reaching influence on multi-modal related fields such as intelligent question answering system,looking at pictures and talking.There are two main problems in image text matching research.First,we must overcome the gap between the two data modalities of image data and text data to establish a high-performance algorithm flow.Secondly,the design of the algorithm should have a certain degree of robustness,can maintain its high performance in different datasets.The current image text matching methods can be divided into two basic types according to the modeling method: one is based on classification,which regards the matched data pair and the unmatched data pair as two classes,and measures the relationship between image data and text data through the matching degree;The other is to embed the image data and text data into a space by encoding the image data and text data through the algorithm,and to measure the relationship between the image and text through the similarity.In this dissertation,a fusion layer is designed to bridge the data gap between image data and text data,and to improve the performance of image-text matching network.In this dissertation,the fusion layer contains a fusion feature between the image features and text feature,through the image features and text features,fusion features of the relationship between the three modal features to re-encode the image features and text features to generate a better embedding space to achieve higher performance of the algorithm.In this dissertation,the gradient descent method of gradient interaction is designed,so that the neural network can be continuously and effectively optimized,and the robustness of the network model is improved.Through gradient interaction,the performance of image text matching network can be continuously and effectively improved with training on different datasets.The gradient optimization method in this dissertation is simple and effective,and it can also be applied to other directions.Experimental results on Flickr 30 K and MS-COCO datasets show that the proposed method is better than the algorithms compared in this dissertation.In addition,this dissertation also carried out the visualization and analysis of the retrieval results.
Keywords/Search Tags:Image Text Matching, Multi-modality, Retrieval, Neural Network, Deep Learning
PDF Full Text Request
Related items