Font Size: a A A

Feature Selection Method Based On Rough Set And Runner-root Algorithm

Posted on:2022-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:D D XuFull Text:PDF
GTID:2518306500955809Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Features and models are complementary each other in the fields of machine learning,data mining and so on.If features are few,the model won't be complete the decision-making task.To the contrary,with too many features,resulting in information redundancy,the model will be difficult to train and the training cost is quite high.Feature selection is one of the methods to remove redundant features.To reduce spatial dimension,feature selection is an important step in the data preproccessing stage in many areas,such as machine learning,data mining and other fields.The methods of feature selection include filter,wrapper and embedding method.Rough set,with the attribute reduction being the best important part in it,is a mathematical tool to deal with information which is uncertain.The algorithms about attributes reduction,which are based on information entropy,attribute dependence,attribute significance and other to show the correlation between conditional attributes and decision attributes,belong to methods of filter.Classical rough sets is only suitable for processing the discrete data.Processing continuous data need to be discretized,which could lead to the loss of information.Therefore,the neighborhood rough set and the concept of metric space are put forward,which could turn the equivalent relation of rough set into the covering relation of particle information in neighborhood space.Neighborhood rough set could directly deal with continuous data,avoid leading to lose original information due to discretization.With the continuous emergence of intelligent algorithms,rough set and intelligent algorithm are combined for feature selection.The intelligent algorithm is used for searching feature subset in the feature space,and rough set is used as the evaluation function to evaluate the feature subset.A good search algorithm can find the optimal feature subset in the feature space as quickly and accurately as possible.The quality of the evaluation function also affects the establishment of the model.The main research of this thesis are as follows:(1)A improved algorithm about forward greedy numerical reduction algorithm based on neighborhood rough sets is put forward.The forward greedy numerical reduction algorithm constantly adds attributes with high attribute significance to the reduction set until the value of importance of attributes added is zero.Through experiments,it is found that setting the zero as threshold in this algorithm is not a good choice.The maximum individual value cannot represent the maximum value of the entirety,which is the deficiency of greedy strategy.So how to set an appropriate threshold?This article uses roulette algorithm to improve the forward greedy numerical reduction algorithm,All attribute has a chance to be selected,but the attribute with high significance has more possibilities.Within the reasonable choosing times,once one attribute has been selected,it will be added into the feature subset.Setting multiple iterations,the classifier takes the accuracy as the evaluation criterion,scoring for the selected feature subset in each iteration.Finally the algorithm selects one which has a best classification effect as the best feature subset.In terms of feature selection method,the improved forward greedy numerical reduction algorithm based on roulette algorithm is a combination of filtering method and wrapping method.The experimental results show that the improved forward greedy attribute reduction algorithm has better feature selection effect.(2)Combining neighborhood rough set and Runner-root algorithm to study feature selection.The attribute dependence and importance of neighborhood rough sets are the evaluation of the correlation between conditional attributes and decision attributes.Runner-root algorithm,simulating reproductive process of stolons,using a unique search strategy,the global searching and the local searching,continuously iterates for the best subsets in the feature space through the evaluation function.The evaluation function designed in this thesis is divided into three parts,including the dependence of the decision attribute on the selected feature subset,the size of the feature subset and the significance of the remaining unselected attributes.In this way,the evaluation function has considered the whole feature avoid neglecting the possible influence of the unselected features on the decision result.The experimental results show that the feature selection algorithm based on neighborhood rough set and Runner-root algorithm is the best.
Keywords/Search Tags:Feature selection, Search strategy, Evaluation function, Neighborhood rough set, Runner-root algorithm
PDF Full Text Request
Related items