Font Size: a A A

Research On Image Description Generation Method Based On Multiscale Features And Attention Fusion

Posted on:2021-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:X G ZhouFull Text:PDF
GTID:2428330629986189Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and computer intelligence,image data and video data have increased dramatically.In order to better meet people's needs for image information and video information,technologies for processing images and videos are urgently needed.The automatic generation of image description text by machines is one of the current research hotspots in artificial intelligence.It builds a bridge between computer vision and natural language processing,and is a crucial step towards the common application scenarios of machine intelligence.The image description generation task uses computer vision technology to automatically complete image target recognition,and then uses the machine translation model to express the interpreted content into a natural language text.Although the image description generation task faces many challenges,it has wide application prospects and has important application value and practical significance.The main contents of this article are as follows:(1)First introduce the research background and research significance of the research direction of this thesis,and then introduce the foreign research status of image description generation.Then introduce the advantages and disadvantages of the existing image description generation model,the attention mechanism in deep learning,the related theories and key technologies of multi-scale image and image description generation.(2)Most of the current deep learning models extract image features.Most of the last layers of the pre-trained model of the convolutional neural network are used as global features.The image scale is single during the image information extraction process,and images at different scales are ignored.The extraction of features also lacks the extraction of local information of the image,resulting in the problem that the text description of the picture content is not accurate enough and the semantics are vague.To solve this problem,this paper is based on the fact that in convolutional neural networks,different feature layers have different scales.Multi-scale features are formed by fusing high-level features and low-level features in convolutional neural networks.By extracting the features of different layers of the convolutional layer,and then performing feature fusion,the image information is fully extracted,and the accuracy of image description generation is improved.(3)The image features of different scales obtained in the encoder are selected and fused through the attention mechanism,and then the image description is generated through the decoder to improve the semantic interpretation of the image by the model.Through the attention map generation module,generate attention maps of different layers,multiply the attention map with the features of each layer to obtain attention features,and multi-scale fusion of the obtained attention features to improve the convolutional features of each layer Significance information extraction.Attention-based feature fusion can more effectively extract image features and reduce the amount of data,and improve the accuracy of image description generation.The method in this paper is trained,tested and evaluated on the Microsoft COCO dataset.On the evaluation indicators such as BLEU,ROUGE-1 and CIDEr,the results of the existing benchmark models are compared.Experimental results show that the model proposed in this paper can generate more accurate,complete and meaningful image description sentences.
Keywords/Search Tags:image description generation, deep learning, multiscale, attention
PDF Full Text Request
Related items