Font Size: a A A

Research On High Dimensional Data Classification Via Dictionary Learning

Posted on:2019-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:B ChenFull Text:PDF
GTID:2428330566977973Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and continuous evolution of modern information technology,human society is gradually changing into a digital,cyber and intelligent one,bringing in high dimensional data such as text,voice,image and other forms.How to effectively mine,classify and make full use of high dimensional data has gradually become a hot research topic both at home and abroad.However,high dimension data with high dimension and large volume can easily lead to "dimension disaster".The key problem to achieve its successful classification is the effective extraction and selection of high dimensional data.Hence,this paper aims at the problem of feature extraction and selection,based on pattern recognition and machine learning.From the perspective of sparse representation strategy,the research trend of dictionary learning is combined.In this paper,feature extraction and selection is studied simultaneously under the same framework.The intrinsic structure hidden in the high dimensional data is deeply mined,and the efficient and accurate classification of high dimensional data is achieved ultimately.This paper mainly contains the following contents.In order to cope with high feature dimension and the difficulty of extracting important information from a few dimensions,"sparsity" is introduced into the traditional PCA,leading to sparse embedding based dimensionality reduction(SEDR)method.Firstly,Based on the integration of feature transformation and dictionary learning,the dimensionality reduction model under sparse embedding is constructed.Then,the corresponding coordinate alternation algorithm is given to solve the proposed model.By optimizing the orthogonal projection matrix in the process of sparse coding,the dynamic adjustment of the low dimensional subspace distribution is realized.The intrinsic sparse structure hidden in high dimensional data is deeply mined,which simultaneously promotes dimensionality reduction and dictionary learning.In order to cope with large volume and the difficulty of extracting important information from a few features,the hinge loss is introduced into the dictionary learning.Under the guidance of support vector machine,the sparse embedding max margin dictionary learning(SEMMDL)method is proposed.Firstly,a general feature selection strategy is derived based on the weighted summation of the squared distances between all pairs of coding features.Then,based on this feature selection strategy,the quadratic hinge loss is introduced to the selection of coding features,and the correspondingdiscriminant feature learning model and algorithm are given.Consequently,the irrelevant and redundant coding features are effectively removed,and a discriminative strong characteristic subset is selected,which improves the classification performance of high dimensional data.Finally,the proposed method is evaluated on a series of benchmark data sets such as Extended Yale B and Caltech-101 versus many state-of-the-art methods,which illustrates the effectiveness of this method.The proposed method can simultaneously optimize the projection matrix,dictionary and classifier model.The experimental results shows that the algorithm has the advantages of high classification accuracy,good stability and high testing efficiency,which can effectively deal with the classification of high dimension data,especially high dimensional sparse data.
Keywords/Search Tags:High dimensional data classification, dictionary learning, feature transformation, feature selection, sparse embedding
PDF Full Text Request
Related items