Font Size: a A A

Online Streaming Feature Selection Algorithms Of High-dimension And Class-imbalanced Data

Posted on:2022-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ChenFull Text:PDF
GTID:2518306485450164Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Traditional feature selection algorithms assume that all features in the feature space already exist and their feature values are available before performing feature selection.However,in many practical application fields,there are scenarios where the entire feature space cannot be predicted in advance,and features flow into the feature space in a stream.In this regard,the research on online feature selection methods oriented to feature streams has been initiated-Feature flow refers to the dynamic flow of feature data into the feature space over time,and the information of the entire feature space cannot be predicted in advance.With the advent of the era of big data,data presents high-dimensional and class-imbalanced,and traditional batch feature selection method can not meet the requirements of time complexity.Compared with traditional selection methods,online streaming feature selection is more suitable for processing high-dimensional and class-imbalanced data.Research on feature selection algorithms of high-dimensional and class-imbalanced data has import research and application value.Based on the research of the existing online streaming feature selection algorithms,this paper takes high-dimensional and class-imbalanced data as the research object,several new online methods for class-imbalanced data in view of some of its existing problems and deficiencies streaming feature selection algorithm is proposed,the main research and related work are as follows:(1)For most of existing online streaming feature selection algorithms not considered class-imbalanced before learning.A new online streaming feature selection algorithm based on high-dimensional and class-imbalanced is proposed-In many real-world application scenarios,feature space is dynamic and its features exist in a stream.However,in the data which exists class-imbalanced situation at the same time,existing online streaming feature selection algorithms usually pay attention to large classes and ignore the small classes which can be important in those applications.Motivated by this,An high-dimensional and class-imbalanced online feature selection algorithm based on neighborhood rough set is proposed.The algorithm design is based on rough dependency calculation formula of small class significance-Meanwhile,three evaluation criteria of online redundancy analysis,online relevance analysis,and online significance analysis,are presented to select features with high separability between large and small classes.Extensive experiments on 7 high-dimensional and class-imbalanced data sets show that OFS can achieve better performance than some state-of-the-art online streaming feature selection algorithms in an online manner.(2)According to the existing online streaming feature selection algorithm ignore the class-imbalanced problem in the data sets,and need to acquire domain knowledge and specify parameters values in advance before learning,a new online high-dimensional and class-imbalanced streaming feature selection algorithm M OFS based on adaptive neighborhood relation is proposed.In order to handle different types of data sets,a new adaptive neighborhood relation that based on neighborhood rough set model,can automatically determine the number of neighbors based on sample distribution is defined.Meanwhile,consideration of the effect of boundary samples,three online feature subset evaluation metrics are proposed to select features with great discriminability in large and small classes.Extensive experiments demonstrate the efficiency of our new method M OFS.
Keywords/Search Tags:Online Streaming Feature Selection, neighborhood rough set, rough dependence, class imbalance, adaptive neighborhood
PDF Full Text Request
Related items