Font Size: a A A

Feature Selection And Its Application In Classification

Posted on:2021-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:J Y LiFull Text:PDF
GTID:2428330629453114Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence,all kinds of data have been produced in various industries.These data show the characteristics of mass,diversification and high dimensionality.However,in these big data,a lot of features are little effect.They increase the storage of the computer and affect the operation efficiency of the algorithm.On the other hand,the noise and outliers in the data will have a great impact on data mining and will affect the accuracy of the model.Therefore,feature selection and robust learning are particularly important.Feature selection can delete those redundant features that are useless or have the opposite effect on the model,which greatly reduces the calculation amount for subsequent classification or clustering algorithms.Robust learning can effectively reduce the influence of noise and outliers on the model,and make the algorithm more stable.This paper proposes a new feature selection algorithm and a support vector machine?abbreviated as SVM?classification algorithm.The second algorithm makes up for the limitation that the first algorithm can only select features.The core content and original points of this article are as follows:Aiming at the limitation that traditional group lasso can only be grouped in pairs,this paper first proposes a new feature selection algorithm combining multi-view learning and fuzzy C-means clustering.Specifically,this paper first clusters all the features by fuzzy C-means clustering,and the features in each class are regarded as a group.Then we use group lasso to sparse the features in each group,and not to sparse the features between groups,so as to effectively find redundant features.Finally,the multi-view learning is used to integrate the information from multiple views to fully explore the interaction between the various views.At the same time,all the features are sparse by l2,1-norm,which greatly reduces the calculation amount of the algorithm.The first feature selection algorithm aims to remove redundant features from the data,and then classify the data set after feature selection.It requires a two-step process.Therefore,this paper designs a new SVM algorithm for simultaneous feature selection and classification.Specifically,we first apply a weight to each sample through robust statistical learning.The larger the weight,the greater the importance of the sample,and the weight of the noise samples is relatively small,which effectively reduces the impact of noise.Then,a new l12-norm sparse regular term is proposed to consider the importance of features.The weight of redundant features is relatively small.Finally,cost-sensitive learning is used to consider the impact of class imbalanced data,while avoiding the limitations of classifying accuracy to measure algorithm performance.This paper takes sparse learning,multi-view learning,and robust statistical learning as its core technologies,and carry out classification experiments for different data.The first algorithm is to use the existing SVM to test its performance after feature selection.The second algorithm is to perform feature selection and SVM classification simultaneously.During the experiment,medical data sets,text data sets,and artificial simulation data sets were used.Compared with the comparison algorithms,the proposed algorithms shows superior performance.
Keywords/Search Tags:Feature selection, Multi-view learning, Support vector machine, Robust learning, Cost-sensitive
PDF Full Text Request
Related items