Font Size: a A A

Research On Feature Selection Technology Based On Markov Blanket Representative Set

Posted on:2021-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:J X LiFull Text:PDF
GTID:2518306047488194Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the development of data collection systems and data structure complexity,we need to improve the algorithms of data preprocessing in order to obtain effective prediction model using high-quality data.However,in practice,the increase of data dimension often leads to the improper selection of feature subsets,which has a great impact on the classification results.There are two problems in the selecting feature subsets by using Markov blanket.Firstly,it is NP hard to find the spouse node of a class attribute from a directed graph based on probabilistic description.Secondly,when the data set does not satisfy the faithful distribution,the number of Markov blankets of the target node will increase exponentially with the increase of feature dimension.Based on Markov blanket representation set and sub-optimal feature subset,this paper studies feature selection for different search algorithms.Firstly,for Alpha search algorithm to select features one by one,this paper proposes an Alpha algorithm based on Markov blanket representation set(MBRAS).The algorithm first considers the correlation,the relevancy between each feature and class attribute is measured based on the maximum information coefficient,and the parent nodes with high relevancy is obtained.Then,the algorithm establishes a measurement standard for feature and subset redundancy analysis,and by setting a threshold,it excludes the non-dominant features that are low correlated with class attribute and high correlated with the representative set,so as to generate the sub-optimal feature subset and reduce the feature search space.Finally,the algorithm proposes a penalty function for adding a single feature into the prediction model based on the correlation of the feature to the class attribute and the influence of the feature on the classification results,and uses the Alpha search algorithm to select the optimal feature subset.In this paper,multiple data sets in UCI and ASU are used to verify the classification performance of the algorithm.The results show that compared with the classical feature selection algorithm,MBRAS has higher classification accuracy and greatly reduces the dimension of data set.In other words,under the case of reducing information omission,the classification accuracy can be achieved with fewer features.Secondly,in many practical problems,due to the increase of thousands of data feature dimensions and the dependence between features,the establishment of penalty function for search evaluation of a single feature has certain limitations.It is flexible for the number of final feature selection.In order to improve the benchmark of feature selection algorithm for processing high dimensional data,a particle swarm optimization algorithm based on Markov blanket representation set(MBRPSO)is proposed in this paper.Firstly,the original space is preprocessed based on correlation analysis of features in the input set and redundancy analysis in the representative set.And then,the particle swarm optimization algorithm is used to initialize the particle,get more groups of feature subsets,and through the new fitness function put forward to calculate the particle's individual optimal value and swarm optimal value,so as to get the optimal subset by iterate and updated constantly.Experiments show that compared with other advanced feature selection algorithms and classical Markov blanket feature selection algorithms,MBRPSO algorithm has the least number of selected features and lower classification error.Finally,this paper summarizes the content of the overall study,and put forward the future development direction and the place need to be improved.
Keywords/Search Tags:Feature selection, Maximal information coefficient, Markov blanket representative set, Sub-optimal feature subset, Particle swarm optimization
PDF Full Text Request
Related items