Font Size: a A A

Research On Online Streaming Feature Selection Algorithms

Posted on:2019-03-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:P ZhouFull Text:PDF
GTID:1368330602982897Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Traditional feature selection methods assume that all features in the feature space already exist and their feature values are available before learning.However,in many real-world applications,there are many scenes in which the entire feature space cannot be acquired in advance,and its features exist in a stream.Motivated by this,research on online feature selection based on streaming features has emerged.Streaming features are features that arrive one by one or group by group in a stream manner,while the information of the entire feature space cannot be known in advance.With the dramatic increase of data volume and dimension in the era of big data,the traditional batch feature selection method unable to meet the demand in efficiency anymore.Compared with traditional feature selection methods,online streaming feature selection is more suitable for processing high-dimensional massive data and feature space unknown problems.It has important research and application value.In this dissertation,based on the problems and deficiencies of existing online streaming feature selection methods,several new online streaming feature selection algorithms are proposed.The main research works are as follows:(1)For most of existing online streaming feature selection methods need to acquire domain knowledge and specify parameters values in advance before learning,a new online streaming feature selection algorithm based on adaptive neighborhood relation is proposed.In order to handle different types of data sets,a new Gap neighborhood relation that can automatically determine the number of neighbors based on sample distribution is defined.Based on Gap neighborhood relation,a new online streaming feature selection algorithm OFS-A3M is constructed.In terms of the Neighborhood Rough Set theory,OFS-A3M does not need to require domain knowledge before learning.Meanwhile,with the Gap neighborhood relation,OFS-A3M need not set any parameters in advance.According to the three criteria of "Maximal-dependence,Maximal-relevance and Maximal-significance",OFS-A3M can select features with high correlation,high dependence and low redundancy.The experimental results show that OFS-A3M is superior to some existing traditional feature selection methods with the same number of features.Meanwhile,it is better than the state-of-the-art online streaming feature selection algorithms in an online manner.(2)A new online streaming group feature selection method considering feature interaction is proposed.Existing online streaming feature selection methods focus on removing irrelevant and redundant features and selecting the most relevant features while ignoring the interaction between features.Interacting features are those that appear to be irrelevant or low relevant to the class individually,but when it is combined with other features,it may highly correlate to the class.Based on the framework of Mutual Information theory,the feature relevant,feature redundancy and feature interaction are defined,and a new feature interaction weight factor that can measure the degree of interaction between features is proposed.Based on this new feature interaction weight factor,a new online streaming group feature selection algorithm OSGFS-FI that can effectively select interactive features is proposed.Extensive experiments conducted on both synthetic and real-world data sets demonstrate the efficiency of our new method.(3)A new online streaming feature selection method for high-dimensional class-imbalanced data is proposed.Class-imbalanced means that there are many more instances of some classes than others in the same dataset.In such cases of class imbalance,existing online streaming feature selection algorithms usually ignore the small classes which can be important in those applications.It is hence a challenge to learn from high-dimensional and class-imbalanced data in an online manner.Motivated by this,we first formalize the problem of online streaming feature selection for class imbalanced data,and then present an efficient online feature selection framework regarding the dependency between condition features and decision classes.Meanwhile,we propose a new algorithm of Online Feature Selection based on the Dependency in K nearest neighbors,called K-OFSD.In terms of Neighborhood Rough Set theory,K-OFSD uses the information of nearest neighbors to select relevant features which can get higher separability between the majority class and the minority class.Finally,experimental studies show that our algorithm can achieve better performance than traditional feature selection methods with the same numbers of features and state-of-the-art online streaming feature selection algorithms in an online manner.
Keywords/Search Tags:Feature Selection, Streaming Features, Online Streaming Feature Selection, Neighborhood Rough Set, Adapted Neighborhood Relation, Feature Interaction, High-dimensional class-imbalanced data
PDF Full Text Request
Related items