Font Size: a A A

Online Streaming Feature Selection Method Based On Neighborhood Dependence

Posted on:2022-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y LvFull Text:PDF
GTID:2518306773467994Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Feature selection,as an important data preprocessing method,can effectively improve the efficiency and effectiveness of the model.However,with the acceleration of data generation and collection in the era of big data,traditional feature selection algorithms face severe challenges:(1)The rapid growth of data has the characteristics of high dimensionality and few samples.(2)In practical applications,it is impossible to obtain the entire feature space in advance,and features exist as dynamic flows in feature space.Therefore,the traditional batch processing feature selection mode cannot meet the efficiency requirements of the algorithm in the era of big data.Online feature selection for high-dimensional massive data and unknown scenes in feature space has important research and application potential.Based on the study of existing online streaming feature selection algorithms,this thesis proposes two novel online streaming feature selection algorithms for the existing problems and shortcomings.The main work is as follows:(1)Joint Neighborhood Boundary for Online Streaming Feature Selection.In many real-world application,there are scenarios in which features dynamically flow into the feature space one by one over time.However,the current algorithms based on the neighborhood rough set theory only regard the information contained in the positive field as valid information.Thus the validity of the information contained in the boundary region is ignored in the case of less noise data.Based on the neighborhood rough set,this paper redefines the dependency function by combining the positive field and boundary information.On this basis,three feature evaluation criteria are proposed.Then,this paper designs a joint neighborhood boundary online streaming feature selection algorithm OFS-JNB(Joint Neighborhood Boundary for Online Streaming Feature Selection).Finally,compared with other online streaming feature selection algorithms,the results show that the feature set selected by this algorithm has better performance.(2)Online Streaming Feature Selection Based on Feature Interaction.In many practical applications,features usually flow into the feature space dynamically one by one over time.At the same time,due to practical reasons,the sample size of the research object is very small,so the data presents the characteristics of high-dimensional small samples.However,traditional online streaming feature selection methods focus on relevant features,irrelevant or redundant features,ignoring the interaction between features,thus cannot efficiently handle high-dimensional small-sample data problems.Based on the principle of feature interaction,this paper proposes a feature interaction definition based on neighborhood rough sets.Secondly,online importance analysis and online redundancy analysis strategies are redefined based on feature interaction,and a new online streaming feature selection based on feature interaction(OFSI)is proposed.Experimental results with 6 algorithms on 11 datasets show that the proposed algorithm significantly outperforms other state-of-the-art online streaming feature selection methods.
Keywords/Search Tags:Online Feature Selection, Streaming Feature, Neighborhood Rough Set, Feature Interaction
PDF Full Text Request
Related items