Research And Implementation Of Image Generation Description Algorithm Based On Feedback LSTM And Attention Mechanism

Research And Implementation Of Image Generation Description Algorithm Based On Feedback LSTM And Attention Mechanism
Posted on:2021-07-29	Degree:Master	Type:Thesis
Country:China	Candidate:B Y Cao	Full Text:PDF
GTID:2518306308973259	Subject:Computer Science and Technology
Abstract/Summary:
As one of the indispensable digital information carriers for human beings,images are created at a blowout speed so that no organization can fully browse these contents artificially,let alone try to understand the semantics of the pictures one by one.The wave of artificial intelligence is coming.Computer vision has realized the ability to make machines "see"pictures,such as image classification and image detection.Furthermore,how to make machines accurately understand the meaning of pictures at the semantic level is becoming increasingly important.The image generation description algorithm studied in this paper is a correct and effective solution to this problem.It realizes real-time cross-modal conversion from image to text.The research is forward-looking,widely applied,and of great significance.This paper mainly researches the image generation description algorithm from two aspects:the accuracy of subject description and the detail attention of subject description.In order to solve the problem of insufficient accuracy of the subject description,this paper proposes a codec network model based on feedback LSTM(Long-Short Term Memory,LSTM)mechanism.Based on the codec framework and LSTM network,the feedback LSTM mechanism complement the shortcomings of RNN networks in retaining long-term dependent timing information and leveraging the advantages of three LSTM unit status gates in semantic decoding through a series of modules and algorithms including convolutional neural network,regional proposal network,related field mapping algorithm,and subject-feature cache dictionary.The model can track,accurately understand and express subject information of the picture in the granularity of long-short term memory unit,and then effectively feedback the next unit,which finally outputs to the target image description.The model is compared with three mainstream models under public data sets and unified evaluation standards,and the results show that it improves the evaluation standard score and the actual description effect,which effectively improves the accuracy of the subject description.In order to solve the problem of low detail attention of the subject description,this paper proposes a fusion network model based on feedback LSTM and attention mechanism.Combined with the codec architecture based on feedback LSTM mechanism,this model firstly enhanced the feature map at the encoding stage through fusing multi-dimensional enhanced expression feature and weight assignment algorithm for multi-dimensional attention focus.Secondly,the model upgrades the feedback LSTM mechanism in the decoding stage,researches and designs an integrated network that fuses the feedback LSTM unit network with multi-dimensional attention focus,which accurately focuses on detail attention of the subject description.This network model realizes the entire process of multi-dimensional attention adaptive feedback that locks the focus point according to the output,and then feedbacks the output through the focus information.In terms of experiments,the model uses public data sets and double evaluation standards,and compares experiments with five models.The results show that the model is significantly superior in detail attention and achieves that the machine can accurately "understand" the picture itself at the semantic level.
Keywords/Search Tags:	image generation description, LSTM network, feedback mechanism, attention mechanism