Font Size: a A A

BPSO-SVM Feature Selection And Its Application In Classification

Posted on:2019-02-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:J X WeiFull Text:PDF
GTID:1318330566964596Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Applying classification technology with large scale data has gradually become an important research field in machine learning and data mining.At the same time,higher requirements for classification performance are presented with the increasing demand of data information acquisition and analysis.To improve the efficiency of classification method,feature selection method is one of the important research methods.Feature selection can filter out the important feature subset,which can satisfy the performance of original dataset classification,improve the accuracy of data classification.And this can make the analysis of decision more accurate and more instructive.In this dissertation,the mainly studies are the improvement of the feature selection algorithm and its application in classification problems with different type of data.Contribution in this dissertation include the following aspects:(1).According to the research of feature selection,based on the Binary Particle Swarm Optimization(BPSO)algorithm,this dissertation analyzes the varying influence factors in BPSO which can influence feature selection result.To increase the performance of classification and reduce number of selected features at the same time,we propose the Mutation Enhanced BPSO-SVM with memory renewal mechanism and mutation-enhanced mechanism for feature selection.This modified algorithm can avoid the premature convergence of particle by judging the time of trapping in local optimum and making the particle jump out of the local optimum.The experimental results demonstrate that,the ME-BPSO-SVM algorithm can find more effective attribute subset for classification,maintain good classification performance and have obvious effect on avoiding the premature convergence.(2).Based on algorithm ME-BPSO-SVM,a new method for solving unbalanced data classification problems is proposed: first,modifies the SMOTE method,then redefines the fitness function in ME-BPSO-SVM and proposes the hybrid MSM(Modified SMOTE with ME-BPSO-SVM)algorithm.This MSM method only samples the effective instances in minority class to reduce the time of generating irrelevant samples and eliminate the influence of the generated these samples on the algorithm complexity.The hybrid MSM method also adapts the classification model with imbalanced data and improves the classification performance.The experimental results show that the hybrid MSM algorithm can find more effective attribute subset for classification and verify that the hybrid MSM algorithm has obvious improvement in classification performance.(3).High-dimensional and small-sample data presents challenges to traditional machine learning and data mining methods,especially the large number of growth dimensions will make data contain a large amount of redundant and irrelevant information.The above information will make the poor performance of machine learning algorithms and cause 'dimensional disaster'.In the application of real life,there are a great number of high-dimensional and small-sample data that have to be faced.Especially,the DNA microarray data in bioinformatics has been studied more widely in recent years.For the problem of high-dimensional and small-sample data classification,this dissertation proposes a new hybrid feature selection method,combining the two algorithms ME-BPSO and MSM proposed in this dissertation,and mixing with Filter methods RI(SVM-RFE with Information Gain)and RT(SVM-RFE with T-test)to solve its feature selection and classification problems.This study adopts the DNA public data sets to verify the validity and reliability of the algorithm.Then,we apply DNA microarray data for pathological diagnosis of autism patients,analyze the experimental results of the three evolution methods proposed in the dissertation.At last,the experimental results show that the hybrid algorithms RT-MEB and RT-MSM can effectively solve the data classification problem of such high-dimensional and small-samples.
Keywords/Search Tags:feature selection, BPSO, SVM, imbalanced data, high-dimensional and small-sample data
PDF Full Text Request
Related items