| Image description task is a technical study of converting images into natural language by machine,in other words,it is a textual interpretation of images.This is a cross comprehensive research topic from vision to text,which is of great significance to the development of human beings in the field of artificial intelligence.The traditional image description model based on retrieval and template has some problems,such as inflexibility,inaccuracy and limitation.However,the statements generated by the image description model constructed based on the deep learning research method can well solve the above problems,so it is more and more favored by researchers.Although the image description model based on deep learning has many advantages compared with the previous two methods and solves their problems,the current image description method still has some problems,such as slow training speed,incomplete extraction of image key information,and the generated description statements are not natural.Aiming at these problems,we improve the research method of optimizing image description model.The research content of this thesis is as follows:(1)We propose an image description generation method based on Res Next-101 network and SE Block attention mechanism module.In the coding part,we use Res Next-101 network combined with SE module and target detector Faster R-CNN to build an encoder model to extract the target region and image features in the input image,thus effectively solving the problem of slow training speed.(2)An image description model generation method based on bilayer LSMN and two attention mechanisms is proposed.The results show that the performance of image description statements generated by language models can be improved by using two-layer LSTM and integrating multiple attention mechanisms in decoder.In the stage of image feature extraction,we extract features through the research content of the first part above,so that the model pays attention to the key part of the image from the very beginning.In order to obtain more complete information between images and objects,we adopt the method of constructing attention mechanism of visual attention and semantic attention respectively,in order to make our description statements more accurate and rich.Finally,in order to solve the problem of mismatch in the training and testing of the model,the reinforcement learning method is used to optimize the description model,and the greedy algorithm is used to get the score of the statements generated by the model on the index CIDEr,and the score is used as a reward to optimize the constructed model.To sum up,we analyze and improve the computer vision image description algorithm based on deep learning.Experimental results show that the proposed image description algorithm can better improve the performance of the image description model and generate richer and more accurate description statements compared with the traditional one. |