Font Size: a A A

Visual Attention Models Based On Deep Learning For Scene Classification

Posted on:2017-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:S X GuoFull Text:PDF
GTID:2428330569498783Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Scene classification refers to recognizing scenes in an image and classifying them into a predefined scene category.It provides important contextual information for image interpretation.In the field of computer vision,it is an important problem to improve the understanding of image scenes.Different from the object category,the scene category has semantic ambiguity and uncertainty,as there are great intra-class differences in the same scene category.Therefore,scene classification is a very challenging task.Human vision system can easily identify scene categories of images.However,after several decades of research,computer vision systems cannot identify scene categories reliably because of the complexity and diversity of scenes in the real world.When dealing with external visual information,human visual system cooperating with the advanced cognitive mechanism,such as the selective attention mechanism,can guide the human being to analyze and obtain the information contained in scenes from images with complicated background.How to design better models and algorithms to simulate human visual attention mechanism and cognitive ability and enhance the automatic image comprehension ability of computers is important both theoretically and practically.In this paper,a novel scene classification framework based on visual attention mechanism and deep convolutional neural network is proposed.In the proposed pipeline,the spatial transformer model with constraint is leveraged as the attention module,which can automatically select and extract regions of interest via the visual attention from input images.Afterwards,convolutional neural networks are utilized to extract features for each visual attention region.Finally,the feature descriptors are constructed by multi-feature fusion,and they are inputted into classifier to complete the scene classification task.Considering that the global and local features both play important roles in scene classification,this paper proposes a scene classification method which combines global model and local visual attention model on the basis of the above framework.The global feature of the scene image is extracted by using the convolutional neural network,while the local feature extraction is based on the above-mentioned deep convolutional neural network with selective visual attention mechanism.The global and local features are complementary,which can be effectively fused,to further improve the performance of scene classification.In this paper,the performance of proposed methods is evaluated experimentally.Experiments are conducted on a dataset from of Places205,a large-scale scene dataset constructed by MIT.And they evaluate the performance of several networks,including fine-tuning the basic convolutional neural network,training the proposed models with different scales and different numbers of attention windows,and training the method with global and local feature fusion,respectively.Furthermore,visualization of network structures and experimental results helps to analyze and understand models.The experimental results show that visual attention models based on deep learning proposed in this paper are able to learn informative attention regions to discriminate scene categories.And they can improve the performance of the scene classification and enhance the understanding of the scene.
Keywords/Search Tags:Scene Classification, Deep Learning, Visual Attention, Spatial Transformer, Convolutional Neural Network
PDF Full Text Request
Related items