Font Size: a A A

Deep Attention Networks For Image Classification

Posted on:2022-06-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:M H ZhuFull Text:PDF
GTID:1488306602493764Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
As a basic research direction of computer vision,image classification can not only be studied as a basic scientific problem,but also provide reference ideas and tools for other research directions(e.g.object detection)in the field of computer vision.In addition to theoretical research value,image classification has a wide range of applications,such as fingerprint recognition of mobile phones,face recognition of Alipay and access control,recognition of the surrounding environment by driverless cars,and so on.At present,the image classification algorithm based on deep convolutional network has made great achievements.How to further improve the bottleneck problems existing in the existing models on this basis? As we all know,deep neural network is proposed to simulate the information processing mechanism of human brain.So can we further learn from some human processing mechanisms to improve the performance of existing models? The answer is feasible.Human visual attention mechanism is a feasible direction.The human visual system can quickly focus on the most prominent or task-related areas of the image,and these areas are often rich in most of the information needed for the current task.The conclusion is also verified by the convolutional neural network based on attention mechanism.Therefore,to further explore how to simulate human attention mechanism and visual information processing mechanism and their joint modeling is a worthy research topic.The main research contents and contributions of this thesis are as follows.1.Convolutional neural networks(CNNs)are one of the most popular deep neural networks that simulate the information processing mechanism of the human brain,and have achieved good performance in many applications.However,CNNs achieve better performance at the cost of higher computational complexity.Inspired by the selective attention mechanism,this paper proposes a CNN based on the attention mechanism,which only selects the task-related regions as the CNN input,rather than the entire input image,to reduce the computational complexity.We use saliency detection methods to detect task-related areas and segment them with rectangular boxes.In addition,this strategy can reduce the impact of different backgrounds or noise on the task to a certain extent.We train the proposed network through back propagation and stochastic gradient descent.Experiments on multiple image classification data sets show that the proposed method is effective and can achieve better performance than existing traditional methods.2.Learning task-relevant features from large amounts of unlabeled data in a completely unsupervised manner is a key step in machine learning.However,the existing methods only consider the common characteristics between data,not the individual differences of each data.Based on the selective attention mechanism and the autoencoder model,this paper proposes a data-driven cost-relevant autoencoder(DCRAE).We design a new single hidden layer network to learn the input data-driven weighting vector to weight the reconstruction error loss.The designed network not only considers the common characteristics between different input data through weights and biases,but also makes full use of individual information through output by using different input data as input.Experiment studies on two visual classification tasks show that the proposed method is effective compared with existing methods.3.In recent 5 years,deep learning has been introduced to tackle hyperspectral image(HSI)classification and demonstrated good performance.In particular,convolutional neural networks(CNNs)based methods for HSI classification have made great progress.However,due to the high dimensionality of HSI and equal treatment of all bands,the performance of these methods are hampered by learning features from useless bands for classification.Moreover,for patchwise based CNN models,equal treatment of spatial information from pixel-centered neighborhood,also hinders the performance of these methods.In this paper,we propose an end-to-end residual spectral-spatial attention network(RSSAN)for HSI classification.The RSSAN takes raw 3D cubes as input data without additional feature engineering.Firstly,a spectral attention module is designed for spectral band selection from raw input data by emphasizing useful bands for classification and suppressing useless bands.Then,a spatial attention module is designed for adaptive selection of spatial information by emphasizing pixels from the same class as the center pixel or that are useful for classification in the pixel-centered neighborhood and suppressing those from different class or useless.Secondly,two attention modules are also used in the following CNN for adaptive feature refinement in spectral-spatial feature learning.Thirdly,a sequential spectral-spatial attention module is embedded into a residual block to avoid overfitting and accelerate the training of the proposed model.Experimental studies demonstrate that the RSSAN achieved superior classification accuracy compared to the state-of-the-art on three HSI datasets: Indian Pines(IN),University of Pavia(UP)and Kennedy Space Center(KSC)dataset.4.Hyperspectral image classification algorithms based on convolutional neural network usually take a cube composed of a pixel-centric neighborhood patch as the input of the network when using the spatial-spectral features.There are a lot of repeated calculations in these models,and the equal treatment of the spectrum further hinders their performance.This paper proposes a contextual competition attention based residual fully convolutional network(CCA-FCN)for hyperspectral image classification.Firstly,the full convolutional network(FCN)can realize the end-to-end training and mapping relationship modeling from all pixels in the entire image to their corresponding class labels,avoiding the repeated calculation of the convolutional network,and at the same time,the convolution using shared parameters takes into account the relationship between neighboring pixels.Secondly,considering that the optimal spectral band subsets used for classification are different for different categories,this paper designs a contextual competition attention(CCA)for each pixel in the image,which not only considers the commonality between pixels,but also considers the characteristics of each pixel.At last,a contextual features extraction(CFE)module is designed to extract multiple contextual information in different neighborhoods.Experimental results on three hyperspectral data sets show that our proposed algorithm can train and predict faster,and achieve the best results.
Keywords/Search Tags:Image classification, deep neural network, attention mechanism, autoencoder, convolutional neural network, hyperspectral image classification, visual attention, fully convolutional network
PDF Full Text Request
Related items