Font Size: a A A

Multi-level Feature Extraction For Image Representation And Its Application

Posted on:2018-02-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:L H WanFull Text:PDF
GTID:1368330590955257Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet,smartphone,and social networks,the acquisition methods for image data are becoming increasingly diversity.There are huge amounts of image data emerged in our daily life from ubiquitous smart terminal to earth observation satellite,making that the requirement for automatically obtaining,analyzing,and understanding image contents has been on a rise.In such background,developing robust image feature extraction for automatic recognition is particularly important.Feature extraction is one of the most important goals in image recognition.From the hand-crafted features of view,feature extraction has been developed from low-level to mid-level features to bridge semantic gap between low-level and high-level features for decades.In recent years,feature extractions have been gradually developed from hand-crafted feature extractions to data-driven feature learning,since the impact of big data background.For example,deep learning,as one of the core feature learning methods,is the most typical research topic at present.Additionly,traditional neural networks including deep learning are based on rate coding,while biological neurons and their synapatic connections have the properties of spike firing and spike timing-dependent plasticity,respectively.How to develop new feature representations based on spiking network can also be a challenge in front of us.Therefore,in research for image representations from lowlevel features,mid-level features,to high-level feature abstraction with deep learning and to spatial-temporal representation with temporal population coding and decoding,the key issues deeply studied in this dissertation are as follows:(1)In low-level feature representations of single-layer architecture with convolution,pooling and normalization provided by supervised or unsupervised methods,there is the highdimensional redundancy.(2)low-level feature description is important for feature encoding,thus,it is necessary to study densely enhanced and affine-invariant description of low-level features,as well as highdimensional problem in feature encoding.(3)In data-driven feature representations of deep learning,most of convolutional neural networks(CNNs)are designed and optimized on single architecture,the subset selection and cascade fusion of multiple CNNs,as well as low-dimensional feature representation,needs studying.(4)Unlike deep learning networks based on rate coding,In order to bring into the computational power of spike neuron,it is necessary to develop spatial-temporal representation based on spiking neural population coding and decoding.The mainly works and innovations are as follows:(1)Design and learn low-level features using single-layer architecture: A hand-crafted Divisive Normalization Feature(DNF)is proposed using linear convolution,feature pooling and divisive normalization.DNF extraction is simple and contains two types of local description.Then,linear dimensionality reduction and pooling weight learning are introduced to improve DNF when pooling region is small or large.In addition,two types of feature learning are discussed in single-layer architecture.The first is unsupervised convolutional kernel learning based on whitening transformation and spherical k-means.The convolutional kenels have abilities of quick learning and good generality,and no hyperparameters need to be tuned.The second is supervised discriminative kernel learning based on the optimization between withinand between-class.The discriminative convolution kernels with small sizes can be effectively learned on each small image patche,and have a low computational complexity.The two types of convolutional kernel learning are then used for feature extraction and further combined with local contrast normalization,feature pooling,linear or kernel-based dimensionality reduction.Experiments on face datasets demonstrate the effectiveness of the proposed methods.(2)Feature encoding and mid-level representation: Two types of mid-level representation for scene classification and object detection are proposed.First,an enhanced local description with multi-scale and dense strategies is introduced in feature encoding.Enhanced local description is based on DNF and combined with two types of linear filtering.Multi-scale and dense strategies are robust to image scales,positions,and lighting changes.Experiments on two types of remote sensing image classification demonstrate the effectiveness of the proposed method.Second,a novel image representation based on affine-invariant description followed by feature encoding and large-margin dimensionality reduction is presented.Affineinvariant local descriptors based on interest point detection and second-order moment estimation are first extracted,followed by feature encoding to form high-dimensional representation.Then,low-dimensional linear transformation based on large-margin constraint and stochastic subgradient learning is introduced to perform dimensionality reduction and similarity learning simultaneously.Experiments on two types of remote sensing dataset containing aricrafts and vehicles demonstrate the effectiveness of the proposed method.(3)Cascade representation for high-level feature abstraction: CNN obtains different levels of image representation from local low-level features to high-level abstract representation.CNN indicates significant advantages compared with shallow representation,and CNNs with different architectures have different representation abilities.Therefore,a novel image representation method based on selective CNN and cascade classifier is proposed.First,a comparative study of multiple CNNs is conducted and a subset selection strategy for the multiple CNNs is performed using weight computation.Then,fusion of the selected CNNs is achieved by cascade linear classifiers.Experiments on three remote sensing datasets demonstrate that the proposed method can obtain improvements in comparison with single CNN.In addition,inspired by PCANet and LDANet using respective PCA and LDA to learn convolutional kernels for multiple layers,CNN followed by subspace learning is discussed for image representation.Subspace learning is beneficial for obtaining low-dimensional representation and improving feature discriminability.Experiments on two face datasets demonstrate the effectiveness of the proposed method.(4)Temporal population coding/decoding and spatial-temporal representation: An extended temporal population coding(eTPC)and decoding model without learning process is proposed to extend image representation from spatial to spatial-temporal domain.Compared with TPC with excited synaptic connection,eTPC adopts inhibitory synaptic connection which can effectively adjust the firing frequency of adjacent neurons in coding stage.Each neuron in eTPC consists of two parts,where one is to establish a linear relationship for its firing frequency using neural stimulus response,the other is the synaptic connection strength of adjacent neuron is determined by neural stimulus response.In decoding stage,first,decorrelation is performed to remove the correlation of spike firing patterns of eTPC.Then,a novel spatial-temporal feature representation called NLSSP(Normalized Local Spatial Summation Pooling)is obtained by spatial pooling and spatial-temporal normalization.Experiments on face recognition and remote sensing scene classification demonstrate that the spike firing patterns of eTPC can be effectively applied to represent image,and eTPC is obviously better than TPC in terms of pattern representation.
Keywords/Search Tags:Low-level feature, mid-level feature, cascade feature representation, deep learning, spiking neural network coding and decoding, recognition and classification
PDF Full Text Request
Related items