Font Size: a A A

Research On Image Description Generation Algorithm Based On Attention Mechanism

Posted on:2020-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:L J ChenFull Text:PDF
GTID:2438330602952741Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,the communication between people extends from a single form of voice and text to a wide variety of information such as videos and images.At the same time,the number and the size of videos and images increase rapidly.Nowadays,smart devices have entered into thousands of households,the demand of humans-computer interaction is increasing rapidly.The automatic retrieval and understanding the content of images and videos has become one of the research hotspots in the field of artificial intelligence and machine learning.Image caption combines the image processing task with the natural language generation task.By establishing the image feature extraction model and the corresponding language model,the content of images can be automatically recognized and converted into natural language information.Image caption can quickly and efficiently process massive image data with the use of computer,and it has great application prospects in many fields such as human-computer interaction.Image caption is a technique based on computer vision and natural language processing,which uses computer vision to extract image features based on deep learning methods,and uses natural language processing technology to build language models.When the above work is finished,images are associated with text.Compared with traditional methods,by using the deep learning method to build the model,it is possible to automatically learn features of image and text though massive image and text dataset.By mapping the image features with the text features,image caption model can be finally established.At present,although the image description task has made achievements,the content of the output text is not rich,and the generated text is inaccurate and incomplete to grasp the details of the image.The accuracy rate of the result still has room for improvement.To solve the above problems,this paper builds a multiple attention image caption algorithm based on image features and language models to improve the extraction of image features and enhances the use of image features.Firstly,the target detection model is used to extract the coarse-grained and fine-grained features of the image,and more abundant image semantics and detailed information are obtained,which increases the information amount of the extracted image features.Secondly,in the language model of image caption,different attention structures are added to use the extracted image features at different granularities.Finally,a multi-level language model is constructed.By introducing a residual connection mechanism to the language model,high-way method is used to transfer data between different layers,which improves the computational efficiency of the model and the final image description effect of the algorithm.Through the combination of the above aspects,the detailed description of the image is enhanced,and the generated caption is described on the overall semantic representation of the image,and the description of the image details is added at the same time.In the end,this paper effectively improves the overall accuracy of image description,and achieves certain results in the description generation task.The main research work completed in this paper includes the following aspects:(1)To solve the lack use of image details in the traditional image caption model,a method of extracting different image features using target detection algorithm is proposed.The method uses different residual layers in the target detection model to transform images into multi-dimensional vectors with different sizes.According to the difference position of residual layer,the multi-dimensional vectors obtained from the residual layer are taken as the coarse-grained features and fine-grained features of the image respectively.Finally,through the above methods,the richness of image features is improved.(2)To solve the insufficiency use of image features in the traditional image description model,a long short-term memory network is used to build a language model in this paper,and an attention module corresponding to the image features is constructed.Finally,the image caption algorithm that takes into account the overall semantics of the image and image details is established.By combining the image attention mechanism with the language generation model,the image description text is generated.(3)To solve the high complexity of deep neural network and the gradient disappearance problem,this paper introduces the residual and dense connection mechanism into the language model,and it increases the efficiency of the model by adding high-way method between different layers to transmit data.
Keywords/Search Tags:image caption, attention mechanism, natural language generation, LSTM, deep learning
PDF Full Text Request
Related items