Font Size: a A A

Research On Image Caption Generation Method Based On Deep Learning

Posted on:2022-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2518306323960199Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today is the era of mobile internet and streaming media,and massive amounts of data are generated every day.A wide variety of large amounts of data has brought great challenges to data analysis.How to mine effective information from data with multiple modalities and huge amounts of data has become a hot topic.Image caption generation technology is a cross-modal analysis task,that is,the conversion of data from the image to text modality.The goal of this technology is to generate a piece of text for an image that can naturally express the image,which is a multidisciplinary research problem.This article will introduce image caption technology in detail from the research background and significance,domestic and foreign research status,and analyze and study image caption generation models and methods from different angles.The specific research content is as follows:(1)Aiming at the problem of the ineffective use of visual information and semantic information in the image captioning technology and the lack of grammatical readability of the generated captions,an image caption generation framework based on the attention balance mechanism and syntax optimization module is designed.First,the model extracts and encodes the visual and semantic information in the image from the image,and uses the method of multi-task learning to obtain the subject of the image;second,the model calculates visual attention and semantic attention separately to obtain the relevant visual features and semantic features of them with the current moment when the vocabulary is generated;again,the model inputs the obtained visual attention features and semantic attention features into the attention balance mechanism,and weighs the two attention information according to the information at the current moment;finally,the model weighs the attention information is input to the grammar optimization module,which is composed of a long short-term memory and an ordered neuron long short-term memory,which can effectively enhance the grammatical readability of the generated captions.Experiments show that this method can effectively and reasonably select the information in the image,and enhance the grammatical readability of the generated captions.(2)Aiming at the lack of stylization knowledge in image captions and the inability to effectively integrate the objective information and style knowledge of the image,a framework for image caption generation based on the style attention mechanism and the reverse enhancement module is designed.First,the model captures and encodes visual information from the image,and inputs it to the encoding end of the style-Transformer.This is to encode image features from the high and low levels;secondly,the deep-encoded features are input to the style-Transformer's decoding end,its style attention module integrates style knowledge for the generated captions;again,the generated captions are input to the reverse enhancement module to optimize the caption generation model from both visual and style aspects;finally,the entire model through pre-training and fine-tuning two-stage training,the generated captions are combined with style knowledge and image objective information.Experiments show that the model effectively solves the problem of lack of style knowledge in objective image captions and the inability of stylized captions to take into account objective image information and style knowledge.
Keywords/Search Tags:Image caption, stylized image caption, attention mechanism, neural network, computer vision
PDF Full Text Request
Related items