Font Size: a A A

Research On Image Caption Algorithm Based On Attention Mechanism

Posted on:2021-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y J JingFull Text:PDF
GTID:2428330602995164Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence and information Internet technology,the technology of transmitting information by image has been widely integrated into people's daily life.Of course,a large amount of image information also brings great opportunities and challenges to image processing technology.Image description as one of the most representative and important issues in the field of modern computer natural vision language processing,can simulate human bottom-up and top-down natural vision attention and mechanism through image description.,combine modern computer vision field and natural language processing field,analyze and describe the image information transmitted in computer images efficiently and accurately.Image description is widely used in visual equipment and tasks of various computers such as UAV,medical,aviation and so on.Based on the theory of attention mechanism and the research of image description algorithms in different periods,two models of image description algorithms are improved,mainly including:1)Improve the overall framework of the soft attention mechanism model proposed in the context of complex backgrounds,and further introduce each part and its details.First of all.Introduce the proposed NIC network model at the encoding end.This stage mainly extracts the solid target area in the image,and uses one-hot encoding to input the information encoding into the LSTM.Then,the text information and the image information are combined,and the network is used to output the image information to the soft attention mechanism model,so that the description sentence can be better combined with the text information during the generation process.The experimental results show that the BLUE evaluation indexes of the algorithm on the MSCOCO dataset are: 71.8%,49.2%,34.4%,24.3%,and the METEOR score is 23.9%.The resulting image text description can better select the entity area and combine semantic information and image information.2)Aiming at the problem that most of the current methods are to force visual attention to each word generated,an image description algorithm based on adaptive attention mechanism is proposed.First,the input image is detected by the Faster R-CNN network,and the entity words are stored in the data dictionary.and then,it is encoded into a fixed length vector in a one-hot way and input into the decoding end network.then,a dual LSTM network embedded with adaptive attention mechanism is used to determine the location of the attention image area,so that meaningful information can be extracted for continuous word generation,and finally thecorresponding text description of the image is obtained.The experimental results show that the score under the evaluation criteria of BLEU series is 72.4%,53.2%,39.6%,29.7%,and the METEOR recall rate reached 24.9%.The generated image text description has better logical correlation and is more consistent with the graphic meaning of human intuitive understanding.
Keywords/Search Tags:image caption, visual attention mechanism, target extraction, convolutional neural network, LSTM
PDF Full Text Request
Related items