Font Size: a A A

Research And Implementation Of Chinese Image Natural Sentence Generation Technology

Posted on:2022-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:D S SunFull Text:PDF
GTID:2518306509460164Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Image captioning is a very challenging task which aims to automatically describe the content of an image using proper textual descriptions.This task involves computer vision(e.g.object recognition,scene classification,attributes and relationship detection)and natural language processing technology(e.g.generating coherent sentences to describe these objects which are identified from the previous step).It requires not only to recognize salient objects in an image and understand their interactions,but also to express the above semantic knowledge in a natural language.With the rapid development in the field of deep learning,many new methods have been applied to computer vision and natural language processing,and consequently the methods for image captioning have been improved.However,most researchers have devoted themselves to generating image captions in English,and less research has been done on how to generate Chinese image captions.Due to the complexity of Chinese expressions and syntax,how to improve the quality of natural Chinese sentences generated is still a challenge.Image captioning also plays a critical role in many real-world applications,such as machine interaction,the mutual search of graphics and text,medical diagnosis and treatment,etc.In the existing research work on the Chinese image captioning task,due to the limitation of the dataset,only simple Chinese descriptions such as "there are many people on the street" can be generated,but cannot be generated a Chinese description style with rich adjectives such as " A woman with Erlang's legs sitting in the classroom with her arms around a little boy wearing a red scarf".In order to solve the above problems,this paper designs and implements a variety of Chinese image captioning models,and conducts related comparison experiments and ablation experiments.The main work of the thesis is as follows:1.This paper proposed an attention mechanism based double-layer LSTM Chinese image captioning model.The scores of BLEU-4,METEOR,ROUGE-L,and CIDEr-D on the corresponding testing dataset are 40.3,35.4,60.8,and 120.4 respectively.Compared with the baseline model,each index score has a performance improvement of more than 2%,especially the score of CIDEr-D has increased from 114.2 to 120.4,and the performance has increased by about 5.4%.2.This paper constructs different image Chinese description models by replacing the image encoder model,stacking the number of layers of the LSTM language model,and adding the attention mechanism.The ablation studies verify that the proposed attention mechanism based double-layer LSTM Chinese image captioning model in this paper can generate more accurate Chinese natural sentences to describe the content of the image.The proposed attention mechanism based double-layer LSTM Chinese image captioning model in this paper can capture more picture details and has stronger language decoding capabilities.The model and its variants proposed were tested on the AIC-ICC dataset of Chinese image caption,and evaluated using BLEU,METEOR,ROUGE-L,and CIDEr-D evaluation indicators.The experimental results show that the method proposed in this paper is superior to other methods,and the actual generation effect also shows that the model can generate more accurate and diverse Chinese natural sentences.
Keywords/Search Tags:Chinese image captioning, attention mechanism, LSTM, convolutional neural network, recurrent neural network
PDF Full Text Request
Related items