Font Size: a A A

Research On Image Caption Method Based On Deep Learning

Posted on:2022-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:D R ZhangFull Text:PDF
GTID:2518306539998159Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a cross-task combining natural language processing and computer vision,image caption is a hot research problem in artificial intelligence.Image caption refers to machine learning and deep learning,and other related technical methods to generate text titles that can describe the visual content of images.It can bring convenience to human life in image-text conversion,information retrieval,and intelligent human-computer interaction.So,this technology has a wide range of application prospects.With the rise of deep learning,many effective image caption generation models have been proposed.The existing image caption data sets are mainly in English,and there are few image caption data sets in other languages.There is no image caption data set corresponding to multiple languages.To solve this problem,this article expands the multilingual image based on the Flickr8 k Title data set;the existing image caption model usually adopts the encoder-decoder framework and uses the long and short-term memory network(LSTM)as the decoder.The LSTM structure is weak in flexibility and has a narrow range of capture correlation.In addition,the existing image caption model can only generate a single target language.In application scenarios that require different language titles,each language must be trained separately,resulting in a waste of time and space.In response to this problem,this paper proposes a multi-channel structure based on the hybrid structure of CNN and Transformer.Language image caption model;although a multilingual image caption data set is constructed,small data size and insufficient training data are still problems compared with other image caption tasks.In response to this problem,this article proposes to improve the low-resource image caption Quality method.The specific work is as follows:(1)Construct a multilingual data set based on the public data set Flickr8 k and a Chinese image title data set for a specific field of the Winter Olympics scene.To solve the problem that the lack of a multilingual image title database leads to the slow progress of related research,this paper expands the Flick8k-based multilingual image title data set,including English,Chinese,Russian,and other six languages,with a total of 240,000 sentence descriptions.In addition to the general field data set,this article uses network collection and manual annotation to construct a Chinese image caption data set about the Winter Olympic Games sports events,containing 11275 sentences of image captions,which provides help generation of image captions in specific fields.(2)This paper proposes a multilingual image caption generation model.To solve the problem of poor quality of multilingual image caption and poor individual training effect,this paper proposes a multilingual image caption model based on the hybrid structure of CNN and Transformer.The trained model can generate image titles in multiple target languages.And the quality of image caption generation is improved compared with the single language image caption generation model.(3)This paper proposes a method based on the combination of multi-feature fusion and data enhancement to improve the quality of image caption under low resources.Aiming at the problem of insufficient training data for a specific task and the difficulty of obtaining the data set,this paper uses the target recognition model to extract the entity information in the training image as an additional feature to fuse with the image feature and pre-trains on an extensive data set by transfer learning based on shared parameters.Improve the quality of low-resource image caption.The effectiveness of this method is verified in two aspects: multilingual image caption and specific field image caption.
Keywords/Search Tags:Deep learning, image caption, multilingual, Transformer, low resources
PDF Full Text Request
Related items