Font Size: a A A

Research On Image Captioning And Person Re-identification Based On Attention Network

Posted on:2022-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:D M ZhouFull Text:PDF
GTID:2518306485986039Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the improvement of hardware computing power,deep learning has been rapidly developed in the wave of computer vision.When people see a picture,people can quickly understand its semantic information of the image according to their own knowledge.However,if we hope the machine can do it like human,then need to providing a lot of data for the machine to learn,extract the features of the image,establish the image model,understand the image in depth and then output the results.In order to focus on the image regions with richest visual semantic information,the researchers proposed the concept of attention mechanism.Attentional mechanisms play a crucial role in natural language processing and deep image understanding tasks,which is consistent with human cognitive systems.Attention mechanism is widely used in various aspects of image understanding,such as semantic segmentation,image description,target detection and tracking,pedestrian re-recognition and so on.The purpose of this thesis is to study the application of attention mechanism in image description and pedestrian re-recognition.Among them,image description can be understood as generating a paragraph of text description by giving a picture,and pedestrian re-identification can be understood as cross-camera image retrieval.Both of them are based on the deep understanding of the image semantics.Through the understanding of the image content,the model outputs the expected results.In this paper,two sub-tasks of attention mechanism in image understanding are studied,and the basic steps of the combination of deep convolutional neural network and attention network are described.Then the key technologies involved are analyzed,and the application method of attention mechanism in two sub-tasks of image understanding is introduced,as well as the shortcomings of this method,and improvements are made on this basis.The main work of this paper is as follows:1)An image description model based on multi-level visual fusion is proposed.In the visual strategy network,visual features are transformed into feature sets of visual knowledge through multi-level sub-neural network modules.Fusion network generates function words that make the description sentence more fluent,which is used for the interaction between visual strategy network and linguistic strategy network.A self-critical strategy gradient algorithm based on reinforcement learning is used to optimize the end-to-end visual fusion network in the language strategy network.The adaptive attention mechanism designed in the fusion network can effectively reduce the interference of non-visual information gradient to visual information gradient and accelerate the convergence of the network training process.Finally,reinforcement learning is used to optimize the problem of exposure bias and error accumulation in language strategy network.Through quantitative and qualitative analysis on multiple open source data sets,as well as comparison with other model methods,the effectiveness of the model is fully proved.2)A pedestrian re-identification model based on local attention mechanism and semantic parsing is proposed.Pedestrian re-recognition is a challenging task in the field of image understanding due to the influence of the change of pedestrian pose,illumination Angle and background.In order to improve the accuracy of identification,the recent research extracted local features of the image by dividing the pedestrians in the data set into several blocks.However,such methods have some problems,such as the mismatch of local human features and the loss of contextual clues of non-human parts.In order to solve the above problems,this paper first runs a semantic segmentation model image segmentation,and then divides the segmented feature map into blocks,using accurate local features to increase the modeling ability of human body semantic parsing for arbitrary contour.In addition,considering the importance of occlusion objects in the local area to the image understanding,the local attention network is used to capture the missing contextual clues of non-human parts.Finally,a large number of quantitative experiments and ablation experiments were carried out on three mainstream data sets to verify the effectiveness of the model and analyze the contribution of local attention mechanism to the model.
Keywords/Search Tags:attention mechanism, Convolutional Neural Network, Reinforcement learning, Pedestrian re-identification, Pedestrian semantic analysis
PDF Full Text Request
Related items