| Benefiting from abundant spatial and spectral information of the spectral image,it has been widely applied to many fields,such as environmental monitoring,urban planning,and military.Spectral image can be divided into two classes: hyperspectral image(high spectral resolution)and multispectral image(high spatial resolution).How to effectively extract features from highdimensional spectral data,and integrate it into spatial feature for hyperspectral image classification are vital problems demanding prompt solutions.How to learn discriminative enough context from high-resolution spatial information plays a vital role in multispectral image classification.This thesis utilizes deep learning technology to address above issues,and validate the effectiveness of the proposed methods on hyperspectral and multispectral data.The main research contents of this thesis are as follows:(1)Spectral-spatial long short-term memories(SSLSTMs)for hyperspectral image classification.Some objects demonstrate dependency between adjacent and non-adjacent spectral bands in hyperspectral image.To capture long-term and short-term dependencies among spectral bands,we propose SSLSTMs.Specifically,each pixel’s spectral values in different channels are successively fed into Spectral LSTM to learn spectral feature.Meanwhile,principle component analysis is first used to extract the first principal component from a hyperspectral image,and then we select local image patches centered at each pixel from it.After that,the row vectors of each image patch are fed into Spatial LSTM one by one to learn spatial feature for the center pixel.In the classification stage,spectral and spatial features of each pixel are fed into softmax classifiers respectively to derive two different results,and a decision fusion strategy is further used to obtain a joint spectral-spatial result.Experimental results demonstrate that SSLSTMs could effectively extract spectral feature,and integrate it into spatial feature.(2)Bidirectional-convolutional long short-term memory(Bi-CLSTM)for hyperspectral image classification.To address issues(i.e.,independent spectral and spatial feature extraction and loss of spectral information)existing in SSLSTMs,we propose Bi-CLSTM.In this network,the issue of spectral feature extraction is considered as a sequence learning problem.For better joint spectral-spatial feature extraction,we replace the fully connected operator in LSTM with the convolution operator.Meanwhile,to sufficiently capture the spectral information,a bidirectional network is proposed to concatenate the learned features into a vector and it is fed to a softmax classifier in the classification stage.Experimental results on three hyperspectral datasets demonstrate that the proposed Bi-CLSTM achieves the best performance.(3)Class-guided feature decoupling network(CGFDN)for multispectral image classification.Contextual information has been demonstrated to be helpful for multispectral image segmentation.However,most of the previous works focus on the exploitation of spatially contextual information,which is difficult to segment isolated objects,mainly surrounded by uncorrelated objects.To alleviate this issue,we attempt to take advantage of the co-occurrence relations between different classes of objects in the scene.Especially,similar to other works,convolutional features are first extracted to capture the spatially contextual information.Then,a feature decoupling module is designed to encode the class co-occurrence relations into the convolutional features;thus,the most discriminative features can be decoupled.Finally,the segmentation result is inferred from the decoupled features.The whole process is integrated to form an end-to-end network,named class-guided feature decoupling network.Experimental results on two benchmarks demonstrate the effectiveness of CGFDN.(4)Hierarchical context network(HCNet)for multispectral image classification.To address two issues(i.e.,the ability of pixel-level features to represent the object is limited and the segmentation performance is dependent on the word vector model)existing in CGFDN,we propose HCNet.It derives category relations via unsupervised learning to model pixel-to-pixel and pixel-to-category relations,which are subsequently used to construct contextual information.More concretely,HCNet consists of a P2P(pixel-to-pixel)sub-network and a P2C(pixel-to-category)sub-network.The P2 P sub-network learns the pixel-to-pixel relation(detailgrained context)for better preservation of the details of the objects.Meanwhile,the P2 C subnetwork models the pixel-to-category relation(semantic-grained context),aiming at improving the intra-object semantic consistency.Outputs of these two sub-networks are aggregated to obtain the hierarchical contextual information.Experimental results demonstrate that the proposed model achieves competitive performance on three challenging benchmarks. |