Font Size: a A A

Research On Image-text Matching Method Based On Semantic Reasoning

Posted on:2022-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:S Y RenFull Text:PDF
GTID:2518306494990539Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With artificial intelligence,deep learning attracts more and more attention,and image-text matching is one of the tasks.Image-text matching task means that given an image and numerous text contents,the machine is required to match the image and text on the basis of understanding the image,and select a paragraph of text closest to the image.Image-text matching involves semantic understanding,image detection and recognition,knowledge reasoning and other related technologies.It requires machines to understand images in a human way,which is also the expectation of artificial intelligence.Therefore,it plays an important role in improving the intelligence of artificial intelligence systems such as robots.Generally speaking,the image-text matching task needs to process the visual information and text information of the image at the same time,and map the extracted visual features and text features into the same high-dimensional space by means of feature fusion,which requires that the image-text matching model can correctly analyze the semantics of the text,so as to give the correct answer by combining the visual features.For complex images,because the complexity of the images is different,the existing models tend to capture the relationships between objects in the images,which makes it difficult for the existing image-text matching models to infer the real relationships in the images.The innovation of this paper lies in the addition of common sense judgment and reasoning module,which distinguishes different parts of images and establishes the relationship between images,and then judges whether the relationship is reliable according to common sense reasoning,which enhances the rationality and accuracy of reasoning results.The sorting optimization module is also added,which provides diversity for the results,and provides some solutions when the model fails.In this paper,two benchmark data sets,MS COCO and Flickr30 k,are used to compare our method with existing methods.Compared with the latest results,the accuracy of this method is improved by about 1.3% on the MS COCO data set,and 1.5%higher than the existing methods on the flickr30 k data set.Experiments show that this method can effectively improve the accuracy and practicability.
Keywords/Search Tags:Image-Text matching, Computer vision, Natural language processing, Common sense reasoning
PDF Full Text Request
Related items