Font Size: a A A

Research On Image Description Method Based On Attention Mechanism

Posted on:2021-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:L Z YangFull Text:PDF
GTID:2428330623459082Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of deep learning,the field of computer vision and natural language processing has attracted the attention of many scholars.The image description task combines the two to generate a natural language related to the image content and semantically compliant according to the image,which is essentially a process from encoding to decoding.This paper improve based on the traditional image description method.The main work and innovations of this paper include:1.The traditional image description task has insufficient ability to process feature information,ignoring the position information of key areas of the image.Therefore,this paper proposes an image description method based on Attention mechanism,using ResNet-101 as the encoder.The feature information of the image is merged with the semantic information,so the model has the ability to select features when decoding the generated description,and focus on the corresponding region of the image.2.Considering that the model takes the actual label of the image as input during the training phase,and the predicted value of the previous moment is used as the input of the current moment in the test phase,the difference between the training model and the test model may cause an accumulated error in the word during the test.Therefore,this paper proposes a method of planning sampling.The predicted value of the last moment is added to the input part of the current moment of the training phase,so that the model of the training phase is close to the model of the test phase,and the strategy of coin-throwing is used to select the training phase inputs the actual value of the current time or the predicted value of the previous time.3.In order to generate a more accurate description during the test,this paper uses the Beam Search technology,the word of the Top B probability at each moment as an alternative,select the word with the highest cumulative score as the result output.4.The Attention Mechanism model divides the feature map regions equally and assigns weights to each region.This method ignores the function of how to select a specific imageregion.This paper proposes a joint attention mechanism based on bottom-up and top-down.For the bottom-up attention mechanism of the model coding part,the target detection technique is used to acquire the object features of the image interest area;and the language decoding part uses the two-layer LSTM decoder to improve the expression of the language output,including top-down attention.The LSTM of the force mechanism and the LSTM of the language model,and a threshold mechanism(Gate)is added to the output of the attention mechanism to filter the redundant information in the model after the Attend,so that the generated description is more reliable.
Keywords/Search Tags:deep learning, image description generation, attention mechanism, scheduled sampling, beam search
PDF Full Text Request
Related items