Font Size: a A A

Image Chinese Caption Generation Method Based On Attention Mechanism

Posted on:2021-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YangFull Text:PDF
GTID:2428330611981896Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid rise of Deep Learning and Artificial Intelligence,the fields of Computer Vision and Natural Language Processing have made great achievements in image understanding and text analysis generation.Gradually,people are no longer satisfied with obtaining a single kind of information from images or text,but consider how to make full use of Multi-modal data to solve the comprehensive task of language and vision.In this new language vision research boom,Image Caption Generation has become a key task.The task of Image Caption Generation is similar to the image writing we did as a child.Its essence is a conversion process from machine vision to natural language.The purpose is to allow the model to automatically generate natural sentences describing the content of a given image.Compared with other visual tasks,this task not only requires the computer to identify the key content in the image and understand the relationship between them,but also needs to express the acquired image information in a suitable natural language.In recent years,inspired by the Encoder-Decoder model in the field of machine translation,breakthroughs have been made in the research of Image Caption Generation.However,due to the limitation of the available data sets and the particularity of the Chinese Natural Language Processing,although the Image Chinese Caption Generation task can be achieved,there are still problems such as low accuracy,completeness,poor consistency,and poor readability.In response to the above problems,this paper proposes a new Image Chinese Caption Generation model(AW-NIC model)combining text word features and Attention Mechanism based on the classic encoder-decoder architecture.The innovation of this method is mainly manifested in the following two aspects:1.Word feature module.Aiming at the particularity of Chinese Natural Language Processing,considering the different contribution factors of different words in Chinesetext,in this article,in the word feature module,combined with the part-of-speech feature,word frequency feature,word length feature,a new method for calculating the contribution of words in the text is designed.The calculation method of medium contribution is used to assign weights to the output text vectors during model training,so that the correctness of words with a large contribution in the description sentence is given priority.2.Attention Mechanism module.In order to reduce the impact of information loss in the image coding conversion process on the model recognition accuracy and improve the text output quality of the model,we add an Attention Mechanism module to the Image Chinese Caption Generation model.This module can determine the image area that is more relevant to the current task according to the previously generated text information,so that the model can obtain different image information at different times of decoding,and achieve the effect of improving the output accuracy of the language model.The AW-NIC model designed in this paper not only uses the word feature module to optimize the model performance and make it more compatible with the characteristics of the Chinese caption generation task,but also uses the Attention Mechanism to guide the image encoding conversion process,so that the language generation model pays more attention to the current task.The image features of relevant parts can effectively improve the output quality of the model.The experimental results of the model on the AIC-ICC data set show that the addition of word features and Attention Mechanism modules will greatly improve the accuracy and completeness of the model output description text.
Keywords/Search Tags:word features, Attention Mechanism, Encoder-Decoder model, Image Caption Generation, AW-NIC model
PDF Full Text Request
Related items