Font Size: a A A

Image Caption Technology Based On Deep Semantic Information

Posted on:2019-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y M FengFull Text:PDF
GTID:2428330611493351Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the big data era,people have to browse and process large amounts of data every day.Especially with the popularity of multimedia devices such as mobile phones,more and more image information has emerged.It is more and more difficult to retrieve and manage massive image data information.For the analysis of the characteristics of image information,computer vision and natural language processing techniques can be used to alleviate image understanding and management issues.Image caption has become a key technology to solve problems.Image caption means that a computer can learn advanced semantic information from a single image and describe the content of an image in natural language just like a human understanding of a scene.In recent years,with the rapid development of deep learning technology,Image caption model based on deep learning has gradually become a research hotspot.However,there is still a “semantic gap” problem in this task,and the high-level semantic feature extraction of images requires more technical support.In this popular framework,the semantic information representation of image features,as the core task of Image caption,plays a very important role for the final performance.Therefore,this paper focuses on the traditional attention mechanism,optimizes the image feature extraction algorithm in Image caption,so that it can automatically capture the key areas within a single image,and learn the semantic information in this field.In addition,image feature fusion algorithm is used to enhance the understanding of image content.The specific research works in this paper are as followed:(1)Attention-based ResNet for Image captioning.Classical attention mechanism extracts image features that neglect three characteristics: spatial,channel and multi-level.This feature is difficult to identify objects accurately,and there is noise interference when generating sentences.Therefore,the image features extracted by this spatial attention mechanism method are lack of diversity,and cannot obtain the semantic information of the image comprehensively and correctly,and cannot give full play to the advantages of the attention mechanism in the Image caption model.In order to alleviate the above problems,this paper proposes a residual network Image caption model based on attention,which uses a new deep stacked network,which works alternately between attention module and residual module,as a new encoder of the model.The model can extract and preserve the key information in the image,and provide more complete and rich image semantic features for the decoder.Through the comparative analysis of MS COCO data sets and related frontier models,the model presented in this paper shows better performance.(2)An Image caption model based on multi-image feature fusion.In Image caption technology,accurate and complete image features can improve the accuracy ofgenerating descriptive statements and provide a comprehensive and clear description of the image scene.The fusion algorithm for different features of the same image can enrich the semantic information of the image.Therefore,in the attention-based residual network Image caption model,this paper proposes an Image caption model based on multi-image feature fusion.In the description of the image technology,accurate and complete image features can improve the accuracy of generating statement,a comprehensive and clear description of the image of the scene.The experimental results show that the multi-feature fusion model based on self-learning weight is the best among the two fusion algorithms,and its performance is more prominent.
Keywords/Search Tags:Deep Learning, Image Caption, Residual Network, Attention Mechanism, Encoder-Decoder, Image Fusion
PDF Full Text Request
Related items