Font Size: a A A

Research On Filters Based On Sequential Forward Selection Strategy

Posted on:2020-02-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:W F GaoFull Text:PDF
GTID:1368330575481197Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the approaching of Big Data era,large amounts of data has been produced in different areas of human activities.These data offer a great deal of information for computer scientists.However,there exist lots of noises and redundant information in high-dimensional data,resulting in hiding useful information.Therefore,feature selection becomes important.Feature selection plays an important role in machine learning and pattern recognition.Feature selection removes irrelevant features and redundant features while retaining the most informative features to improve the quality of high-dimensional data.The already-selected feature subset obtained by feature selection methods could improve the classification performance.According to the selection strategy,feature selection can be roughly sorted into three categories: filter models,wrapper models and embedded models.Filter models are widely used because they are fast,simple and independent of any classifier.Additionally,we design our feature selection methods based on sequential forward selection strategy.We employ information theory as the metric to measure the relevancy between a feature and labels or the correlations among features.Traditional filter models based on information theory can be generally divided into two groups.The criteria in the first group focus on minimizing feature redundancy,whereas those in the second group aim to maximize new classification information.In this paper,we propose two kinds of feature selection methods to overcome the limitations of the two groups of feature selection methods;additionally,a complementary term is proposed to remedy the disadvantage of the two groups of feature selection methods;finally,we propose a feature selection method that optimizes two novel feature selection methods.In summary,the main contributions are presented as follows:1.A comprehensive overview on the two groups of feature selection methods has been done.Furthermore,we propose a hybrid feature selection method named Minimal Redundancy-Maximal New Classification(MR-MNCI)that integrates the two groups of feature selection criteria.Moreover,MR-MNCI adopts class-dependent feature redundancy and class-independent feature redundancy.The experimental results demonstrate the superiority of MR-MNCI.In addition,we point out thedisadvantage of our method and make a plan for future action.2.By analyzing the composition of new classification information,we propose a novel feature selection method named Composition of Feature Relevancy(CFR),we redefine the feature relevancy.Additionally,the CFR can be transformed into a general framework.Finally,CFR outperforms five competing methods in terms of average classification accuracy and highest classification accuracy.3.A drawback of traditional feature selection methods is that they ignore the dynamic change of selected features with the class.To address this issue,we develop a novel feature selection method,namely,Dynamic Change of Selected Feature with the class(DCSF).Furthermore,we redefine feature relevancy term.Finally,the experimental results demonstrate the classification superiority of DCSF.4.Traditional feature selection methods do not distinguish candidate feature relevancy and selected feature relevancy,and some interdependent features may be regarded as redundant features.To address the problem above,this paper proposes a feature selection method named Dynamic Relevance and Joint Mutual Information Maximization(DRJMIM).DRJMIM employs the definition of minimum joint mutual information in Joint Mutual Information Maximization(JMIM)and the dynamic weight in Gene Selection via Dynamic Relevance(DRGS).The proposed method distinguishes the candidate feature relevancy and selected feature relevancy.DRJMIM is compared to JMIM and DRGS and other three feature selection methods on an artificial example and twelve real-world data sets.Experimental results demonstrate that DRJMIM outperforms other compared feature selection methods in terms of classification performance.This paper focuses on the research on filters based on sequential forward selection strategy.We propose different filter methods to address the disadvantages of previous filter methods.These methods remove irrelevant features and redundant features while retaining the most informative features.Therefore,they have some theoretical significance and application value.
Keywords/Search Tags:Machine Learning, Feature Selection, Classification, Filters, Sequential Forward Selection Strategy, Information Theory
PDF Full Text Request
Related items