Feature Selection And Its Application In Classification

Posted on:2021-01-25

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Li

Full Text:PDF

GTID:2428330629453114

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of artificial intelligence,all kinds of data have been produced in various industries.These data show the characteristics of mass,diversification and high dimensionality.However,in these big data,a lot of features are little effect.They increase the storage of the computer and affect the operation efficiency of the algorithm.On the other hand,the noise and outliers in the data will have a great impact on data mining and will affect the accuracy of the model.Therefore,feature selection and robust learning are particularly important.Feature selection can delete those redundant features that are useless or have the opposite effect on the model,which greatly reduces the calculation amount for subsequent classification or clustering algorithms.Robust learning can effectively reduce the influence of noise and outliers on the model,and make the algorithm more stable.This paper proposes a new feature selection algorithm and a support vector machine?abbreviated as SVM?classification algorithm.The second algorithm makes up for the limitation that the first algorithm can only select features.The core content and original points of this article are as follows:Aiming at the limitation that traditional group lasso can only be grouped in pairs,this paper first proposes a new feature selection algorithm combining multi-view learning and fuzzy C-means clustering.Specifically,this paper first clusters all the features by fuzzy C-means clustering,and the features in each class are regarded as a group.Then we use group lasso to sparse the features in each group,and not to sparse the features between groups,so as to effectively find redundant features.Finally,the multi-view learning is used to integrate the information from multiple views to fully explore the interaction between the various views.At the same time,all the features are sparse by l_2,1-norm,which greatly reduces the calculation amount of the algorithm.The first feature selection algorithm aims to remove redundant features from the data,and then classify the data set after feature selection.It requires a two-step process.Therefore,this paper designs a new SVM algorithm for simultaneous feature selection and classification.Specifically,we first apply a weight to each sample through robust statistical learning.The larger the weight,the greater the importance of the sample,and the weight of the noise samples is relatively small,which effectively reduces the impact of noise.Then,a new l₁²-norm sparse regular term is proposed to consider the importance of features.The weight of redundant features is relatively small.Finally,cost-sensitive learning is used to consider the impact of class imbalanced data,while avoiding the limitations of classifying accuracy to measure algorithm performance.This paper takes sparse learning,multi-view learning,and robust statistical learning as its core technologies,and carry out classification experiments for different data.The first algorithm is to use the existing SVM to test its performance after feature selection.The second algorithm is to perform feature selection and SVM classification simultaneously.During the experiment,medical data sets,text data sets,and artificial simulation data sets were used.Compared with the comparison algorithms,the proposed algorithms shows superior performance.

Keywords/Search Tags:

Feature selection, Multi-view learning, Support vector machine, Robust learning, Cost-sensitive

PDF Full Text Request

Related items

1	The Cost-sensitive Support Vector Machine Supervised Learning
2	Research On Multi-View Feature Selection And Semi-Supervised Support Vector Machine
3	Designing Feature Selection And Classincation Methods For Classificationmethods For Imbalanced Learning And Cost-sensitive Learning Problems
4	Research On Fast Algorithms For Cost Sensitive Support Vector Machine
5	Research On Cost-Sensitive Machine Learning Based On Dynamic Cost
6	On View Construction Of Multi-view Learning: Single Tasks And Multiple Tasks
7	Research On Two Algorithms For Cost Sensitive Feature Selection
8	Research On Automatic Diagnosis Methods Of Breast Cancer Based On Cost-Sensitive Learning And Its Application
9	Research Of View-Learning Based Classification Methods
10	Tumor Classification Based On Gene Expression Studies