Font Size: a A A

Research On Feature Selection Algorithm Based On Mutual Information

Posted on:2022-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:F BaoFull Text:PDF
GTID:2518306482493584Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology,high-dimensional data continues to accumulate.There are a large number of redundant and irrelevant features in these high-dimensional data.Therefore,it is necessary to extract the required information from a large amount of data.The dimensionality reduction method saves important information by reducing or mapping high-dimensional data to a low-dimensional space.Two commonly used methods for dimensionality reduction are feature selection and feature extraction;feature selection is to select the feature that maximizes the value of the feature evaluation standard function from the original feature set,and generate the optimal feature subset,which is one of the steps of data preprocessing.First,it can remove the noise in the data and save the calculation time when the data set has hundreds of thousands or more features.The goal of feature selection is to select the most distinguishable features and feature sets that are not redundant with each other as the optimal feature subset;improve the classification accuracy of machine learning tasks,because partially redundant features are related And depending on the characteristics,treating such features as redundant elimination will reduce the classification performance of the algorithm.At present,although researchers have done a lot of work on redundancy related research in order to improve the classification accuracy,and have achieved a series of results,there are still many problems worthy of in-depth study and discussion.In this paper,information theory and feature interaction are used to distinguish the redundancy and correlation between features,and feature selection algorithms are used to perform high-level screening of the data involved.Specifically,the content involved in this article includes two aspects,one is the improved algorithm based on the m RMR algorithm,and the optimization method of the feature selection algorithm and the application of the optimization algorithm in facial expression recognition.In summary,the main work and innovations of this article include:(1)A feature selection algorithm optimization method ImRMR algorithm based on Pearson value is proposed.m RMR algorithm is the best feature obtained by maximizing the correlation between features and target variables.It can ensure the maximum correlation and remove redundant features at the same time.But the redundancy measure of m RMR algorithm is the average value of mutual information of features in feature set.When a pair of features is higher,which will affect the redundancy,its value is more important than the average value of all features.At this time,in order to get a better redundancy measure,this paper selects the maximum value of correlation coefficient as redundancy,and produces a new evaluation method for feature subset: the ImRMR algorithm proposed in this paper uses Pearson coefficient as redundancy index to process the data set of related variables,and selects a more suitable feature subset,Pearson coefficient is used to separate the correlation and redundancy features to the greatest extent,which effectively improves the classification performance of the existing feature selection algorithm based on mutual information(2)Aiming at the impact of the new features generated by feature interaction on the classification performance of the algorithm,a composite feature selection algorithm FIRM is proposed to minimize the interaction redundancy.The algorithm first considers the interaction between all features,from the features that interact with the third party.The new feature information is extracted in the,and the optimal new feature is obtained by maximizing the mutual information between the new feature and the class;then after considering all the features and the interaction of the new feature,the interactive features and redundant features are grouped to eliminate Related features;then remove the redundant features from the obtained composite features,and finally minimize the redundancy between the generated composite features,and obtain the optimal composite feature set.Without changing the complexity of the algorithm,the FIRM algorithm effectively improves the performance of the feature selection algorithm.(3)The application of the FIRM algorithm proposed above in facial expression recognition is to select the corresponding expressions and features on the basis of face recognition in the facial expression data set.At the same time,the corresponding feature data set is screened to a certain extent with the help of FIRM algorithm,so as to obtain the final subset result.The experimental results prove that the FIRM algorithm can retain the characteristics of the required information to a large extent in the process of removing redundancy,and at the same time,it can improve the accuracy of facial expression recognition.This paper mainly studies the feature subset selection problem with inter-class correlation,feature internal interaction and low redundancy.The proposed algorithm enriches the research content in the field of feature selection.These studies have not only promoted the further development of feature selection technology.Therefore,it has certain theoretical significance and application value.
Keywords/Search Tags:Feature selection, mutual information, Pearson coefficient, feature interactive, facial expression recognition
PDF Full Text Request
Related items