Font Size: a A A

Research On Feature Selection Based On Information Metrics

Posted on:2024-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:J R ZhaiFull Text:PDF
GTID:2568307085964609Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the history of human society,the acquisition,processing and utilization of information has been an important driving force for the development of various technologies and engineering.With the advent of modern information society,data information has become a key resource and is widely used in different industries.However,among the many features of the ponderous information,only a few of them are really useful in predicting the target variables,while most of them may be redundant or have no effect on the prediction.For this problem,feature selection is an indispensable technique to select the subset of features with the most predictive power from a large number of features,and thus improve the efficiency and accuracy of subsequent data processing.Nowadays,feature selection is widely used in the fields of data mining,bioinformatics,image processing and intrusion detection.It can be said that feature selection is an inevitable product of information processing and utilization in the development of human society,and an important support for the continuous development and progress of information technology.This thesis centers on the feature selection method using information metrics,and uses it as the main research object to analyze the way information metrics describe the relationship between features and features and between features and classes.In turn,the optimal set of features is provided for the prediction of class labels,thus achieving an improvement in the performance of all aspects of the learning algorithm.The research contents and innovations of this paper are as follows:1.Currently,feature selection algorithms based on three-way interaction information have been widely studied.However,most such conventional algorithms consider only class-dependent redundancy,which may lead to an underestimation of redundancy.To address this issue,a new information metric-based feature selection algorithm Maximum Dynamic Relevance Minimum Redundancy(MDRMR)is proposed.The algorithm first proposes a quality factor based on three-way interaction information,which is combined with selected features to evaluate candidate feature relevance.The problem of underestimation of redundant information is solved by introducing class independent redundancy as a separate redundancy term.The concept of adaptive coefficients is further proposed,aiming to dynamically adjust the proportion of relevant and redundant terms in the evaluation function.2.In the ideal case,the feature subset contains all strongly relevancy features and some weakly relevancy features.The approximate Markov blanket is usually used to select strongly relevancy features,but the approximate Markov blanket may misclassify redundant features as strongly relevancy features.To solve this problem,the Enhanced Approximate Markov Blanket(EAMB)is defined.It addresses the improvement of feature order and discriminative strength in approximate Markov blanket discriminations,and solves the problem that approximate Markov blanket misclassifies redundant features.In addition,Enhanced Approximate Markov Blanket Feature Selection(EAMBFS)is proposed on this basis.This algorithm introduces intensity coefficient to dynamically restrict the difficulty of determining the approximate Markov blanket by the feature scores in each round,and modifies the relevant terms according to whether they are Markov blankets or not.3.Mutual information feature selection has obtained excellent results in applying to feature selection because of its own nature.However,as mutual information feature selection continues to be studied in depth,the performance performance of information metric feature selection has reached a bottleneck.To solve this problem,the idea of ensemble learning is introduced and the algorithm PMAFS is proposed.First,three subsets of features are obtained using three algorithms,Pearson,m RMR and AMB.Then,a feature aggregation approach with a combination of weighted voting and feature ranking is proposed to aggregate the three feature subsets into one subset.Finally,the desired features are intercepted by score as the optimal feature subset.In this paper,the information metric feature selection is investigated to select the feature subset that is most beneficial for the subsequent tasks by analyzing the relationships between feature and class and between feature and feature.The way in which information metrics evaluate feature relationships,combine with other theories,and compose other forms of feature selection is investigated.These studies enrich feature selection techniques and are of practical significance and application value as they are useful for subsequent research in related fields.
Keywords/Search Tags:Feature Selection, Mutual Information, Feature Synergy, Enhanced Approximate Markov Blanket, Ensemble Feature Selection
PDF Full Text Request
Related items