Font Size: a A A

Research On Feature Selection Algorithm Based On Kernel Sparse And Principal Component Analysis

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z LvFull Text:PDF
GTID:2428330629953123Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the vigorous development of the large data and artificial intelligence technology,various industries in the field of society have accumulated a lot of high-dimensional feature data.There are a lot of related redundant features in high-dimensional sample data.High-dimensional sample data will not only increase the burden of storage space,but also occupy a large amount of computing resources.More seriously,high-dimensional sample data will greatly increase the difficulty of data mining and knowledge discovery.It has become quite difficult to extract valuable feature information from high-dimensional sample data.How to extract important feature information from these massive high-dimensional samples and dig out more valuable potential information has become a hot spot in current research.The data preprocessing technology represented by feature selection algorithm is the key to solve this problem.Therefore,this dissertation proposes a feature selection based on kernel sparse representation,aiming at the existing feature selection methods that only consider the linear relationship between feature s and labels,but fail to consider the nonlinear relationship between feature s and labels.In view of the shortcoming of the traditional attribute selection algorithm that the sample attributes still have high correlation after the attribute selection,this dissertation proposes a feature selection based on principal component analysis.Through theoretical derivation and a lot of experimental proof,the two feature selection algorithms proposed in this dissertation are effective and have great improvement in classification accuracy and stability.The details are as follows:(1)Feature selection based on kernel sparse representation(KSFS): Based on the nonlinear analysis between sample attributes and class tags,this dissertation proposes a new feature selection algorithm that combined kernel function with sparse learning in the third chapter,specifically,every feature of data set was mapped to kernel space by kernel function,linear feature selection was performed in the high-dimensional kernel space to achieve nonlinear feature selection in low dimensional space;Secondly,feature data of kernel space is restructured of sparse way to gain a sparse representation of original dataset,meanwhile,this dissertation uses ?1-norm to construct feature selection mechanism and selects the optimal subset of feature structure;Finally,the data after the feature selection is used for classification experiments.Experimental results on public datasets showed that the proposed algorithm can conduct feature selection better,and the accuracy of classification can be increased by about 3%.(2)Feature selection based on principal component analysis(PCFS): For traditional attribute selection algorithms,there are still a lot of redundant attributes in the sample after the selection.This dissertation proposes a novel unsupervised feature selection algorithm that combined principal component analysis with sparse learning in the fourth chapter.This algorithm can select the important features from the data samples without class tags and remove the redundant features to realize the feature selection.specifically,Firstly,the properties of the sample are projected into the new space through projection matrix,using the property of feature self-expression,the projected feature is linearly self-represented by the original attribute,at the same time,we used the L21-norm sparse regularization factor to select the features.Then we embedded in the component analysis the regularization to ensure maximum feature selection of data sample variance,in order to keep the main information data,and then introduced in orthogonal constraints to ensure selection feature attribute linear independence,and will receive a sparse matrix to construct the feature selection scoring mechanism.Finally,we use the subset of the samples after the feature selection for the classification experiment.The experimental results show that the PCFS can well implement the feature selection algorithm on the public data set,and the classification accuracy is improved by 2.5% compared with the comparison algorithm.In this dissertation,the existing feature selection methods only consider the linear relationship between features and tags,but fail to consider the disadvantages of the nonlinear relationship between features and tags,we propose a feature selection based on kernel sparse representation.In view of the shortcomings of the traditional feature selection algorithm,there is still a high correlation between the sample features after the feature selection,we propose a feature selection based on principal component analysis.The two new feature selection algorithms proposed in this dissertation have been theoretically derived and proved.Meanwhile,in order to verify the performance of the algorithm proposed in this dissertation and the comparison algorithm,all the algorithms in this dissertation are tested and analyzed in a unified experimental environment.A large number of experimental results show that the two feature selection algorithms proposed in this dissertation are superior to the comparison algorithm in classification accuracy and performance stability.In the future research work,I will consider combining deep learning technology to propose a new attribute selection model.
Keywords/Search Tags:Kernel Function, Sparse Learning, Nonlinear, Principal Component Analysis, Feature Selection
PDF Full Text Request
Related items