Font Size: a A A

A Research On Semantic Understanding Of Visual Data

Posted on:2021-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:X H ChenFull Text:PDF
GTID:2428330623967792Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image Captioning is a focal issue in the field of computer vision.It requires the computer to understand the semantic information of the visual data.the computer needs to understand the content of the image first,and then summarizes it in a short sentence,we call this sentence as image caption.Image captioning is more difficult than traditional computer vision tasks such as image classification.It is a task with a higher level of abstraction and complexity.In image classification,for an entire image,only the most obvious object is selected as the label for the image,and other information is ignored.The label of the image captioning task is a sentence that can more specifically describe the content of the image,making the computer's understanding more difficult.Thanks to the vigorous development of deep learning in recent years,the successful application of the Convolutional Neural Network has made great progress in the tasks related to image understanding.Through CNN,images can be extacted into some feature matrices or feature vectors,and the original information contained in the images will be largely retained in these matrices or vectors.Therefore,in the image captioning task,in order to understand the content of the image first,researchers often use CNN to refine the information contained in the image.On the basis of refining the image content,a sentence description needs to be generated.This depends on the Long Short Term Memory network with strong text generation capabilities.It can generate content-related description sentences based on image feature.This is the most common encoding-decoding model in the field of image captioning.Although recent encoding-decoding models have achieved satisfactory performance,they only use data from standard datasets.There is still a large amount of unlabeled data on the Internet that cannot be fully utilized.In this paper,I propose a novel approach to enhance the performance of image captioning models using external unpaired images and text.The method in this paper can use the image and text captured from the Internet to improve the performance limited to standard datasets.The method can transfer the knowledge learned from network data to standard datasets.I performed experiments on the MS COCO and Flickr30 K datasets.The results prove the effectiveness of the method.On these two datasets,the results have improved significantly compared with some other state-of-the-art methods.
Keywords/Search Tags:image captioning, visual data, semantic understanding, computer vision, deep learning
PDF Full Text Request
Related items