Font Size: a A A

Study On Feature Selection Based On Mutual Information

Posted on:2020-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2428330596479670Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Feature selection is always one of hotspots in the field of pattern recognition.From the perspective of classification,pattern recognition is to classify data.The process of classification can be carried out in the original data space or transform the original data and map the data to the feature space that can better reflect the essence of classification.Both the training time and the interpretability of the model obtained from the feature space are better than those obtained directly from the original data.Therefore,the study of feature selection is an important task in pattern recognition.This paper introduces the basic concept of filter feature selection algorithm based on mutual information in detail.After analyzing the advantages and disadvantages of the existing feature selection algorithm,this paper proposes two new feature selection algorithms.(1)Feature selection based on Minimum Conditional Relevance and Minimum Conditional Redundancy(MCRMCR)is proposed in this paper.Through analysing RelaxFS(Feature Selection based on Relaxing Max-relevance and Min-redundancy),it uses all the selected features to evaluate the new features,which will cost a lot of time in calculating the correlation between features and classes and the redundancy between features and the selected feature set.Thus,in order to describe correlation and redundancy more accurately,unnecessary redundant information is reduced,MCRMCR only selects a limited number of features from the selected feature set to evaluate the new feature.Experiments show that MCRMCR algorithm can improve the classification accuracy of classifier.(2)Feature selection based on Weight Composition of Feature Relevancy(WCFR)is proposed in this paper.In the traditional feature selection algorithm based on mutual information,weight is introduced into the redundancy items,so as to determine the importance of conditional correlation and redundancy.Variance is introduced as the weight,which acts on both relevancy and redundancy terms in WCRF.WCRF tends to select the candidate feature that is as low-redundancy as possible with the selected feature set,and highly correlated with the class given the selected feature set.Therefore the degree of dispersion using relevancy and redundancy is used to weight the importance of correlation item and redundancy item.The theory and experiment prove that WCFR algorithm can improve the classification accuracy.
Keywords/Search Tags:Mutual information, Conditional relevancy, Conditional redundancy, Feature selection
PDF Full Text Request
Related items