Font Size: a A A

Research On Conditional Attention Mechanism For Visual Task

Posted on:2021-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q J CaoFull Text:PDF
GTID:2428330647952743Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years,deep learning and the development of big data have brought a great breakthrough to artificial intelligence,which is widely used in computer vision,natural language processing,speech recognition and other fields.In the computer vision task,the image has the characteristics of complex background and the main information of the object is not prominent.Usually,the object is manually labeled in the image preprocessing.However,manual annotation requires a lot of manpower and time,and the traditional deep learning models encode the whole image data and cannot find the key areas in the image.Therefore,how to find the location of the object in the image without relying on manual annotation is a problem worthy of research in computer vision tasks.Attention mechanism in deep learning imitates human visual system,selectively pays attention to ROI(region of interest)in image,and ignores other visible information,which makes attention mechanism suitable for solving visual task under weak label.However,the traditional attention mechanism still has some limitations and deficiencies in the application of visual tasks(such as multiple objects recognition,image caption).Therefore,a conditional attention mechanism for visual task is proposed in this paper and makes corresponding experimental analysis.The following work are mainly completed in this paper:(1)In view of the shortcomings of traditional attention methods in house number recognition task,a conditional attention mechanism is proposed in this paper,which calculates the attention characteristics of each object by measuring the similarity between the conditional global feature and the local features of CNN pipeline.Firstly,the structure of the model is described in detail from the principle,and the rationality of the conditional global feature design is verified.Finally,through the experiment,it is proved that the model achieves the highest recognition accuracy in the task of house number recognition,and visualizes the key areas of attention focus through visualization.(2)For the task of multiple object segmentation under weak label and image caption,a language model is embedded in conditional attention mechanism in this paper,and the bidirectional LSTM structure is used to generate high-quality descriptive sentences at the Decoder part.In this paper,the feasibility of the model is proved theoretically,and the structure and principle of the model are described in detail.Secondly,the data set used in the experiment and the objective evaluation criteria of image caption task are introduced.Finally,in the multiple object segmentation under the weak label,the model can segment the objects in the image according to the nouns in the sentence without relying on the annotation information;in the experiment of image caption,the performance of the model on MSCOCO data set surpasses the traditional soft attention model,and achieves a good result.
Keywords/Search Tags:Attention mechanism, Multiple objects recognition, Weakly supervised segmentation, Image caption
PDF Full Text Request
Related items