Font Size: a A A

The Research Of Feature Selection Algorithms Based On Stochastic Search Strategy

Posted on:2018-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:J T WangFull Text:PDF
GTID:2348330536460919Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid development of biological science technology produces a lot of complex biological data.Generally,biological information data have many features.The analysis of complex high dimensional biological data promotes the rapid development of data mining,statistical analysis,etc.High dimensional biological data usually contain noise and irrelevant variables,mining features which are rich in information and filtering out noise data from the complex high dimensional biological data are significant for exploring the nature of biological problems.Among data mining technologies,feature selection is a good way that can deal with high dimensional data.In recent years,it has been widely applied to biological data analysis.This thesis proposes a modified professional tennis player ranking(MPTPR)method which adopts the random search technique.MPTPR combines PTPR algorithm and roulette mechanism together to weigh the feature.PTPR algorithm selects features with equal probability from seed set and non-seed set respectively,while MPTPR algorithm puts the roulette wheel mechanism into the seed set and non-seed set,and makes good features in two sets have higher probabilities to participate in the next round of evaluation.Eight public datasets are presented in this paper to compare MPTPR algorithm with PTPR algorithm,and the experimental results show that in most cases,classification performance of the features selected by MPTPR algorithm is superior to that of PTPR algorithm.A feature selection algorithm based on symmetrical uncertainty and k neighbor classifier is a random search algorithm.The algorithm selects randomly many feature subsets from the feature set,and for each feature subset it uses classification accuracy rate of kNN classifer as evaluation index to forward search,retains the feature subset which has the highest accuracy rate,computes the feature's average accuracy and combine symmetrical uncertainty of feature to evaluate comprehensively the feature.Eight public datasets are presented in this paper to test the performance of the algorithm,and the results show that in most cases classification performance of the features selected by SU-KNN algorithm is superior to that of other common Filter feature selection algorithms.The two feature selection algorithms in this thesis are both random search algorithms.Compared with SU-KNN algorithm,MPTPR algorithm uses roulette algorithm to select features.When evaluating the significance of the features,the two algorithms' feature evaluation methods are different.MPTPR algorithm uses decision trees to evaluate the feature,while SU-KNN algorithm uses classification k NN classifiers to evaluate the feature.Two algorithms evaluate respectively each feature and get the final feature ranking according to features' scores.This thesis compares two algorithms on eight public datasets.
Keywords/Search Tags:Data Mining, Feature Selection, Random Search, Roulette Mechanism, Symmetrical Uncertainty
PDF Full Text Request
Related items