Font Size: a A A

Research On Online Feature Selection With Streaming Features For Classification

Posted on:2021-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:S GuoFull Text:PDF
GTID:2568306104964449Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,online feature selection of streaming features has become a hot spot in data mining.Streaming feature online feature selection is to reduce the data dimension by filtering irrelevant and redundant features in real time while ensuring prediction accuracy.In this paper,for the current classification-oriented online streaming feature selection,there are problems such as low classification accuracy,large number of selected features,or long running time.From the perspective of mining the approximate Markov blanket of the target label,binary-classification oriented and multi-classification oriented online feature selection algorithm for streaming feature,and the algorithm results obtained in different datasets are applied to Decision Tree,KNN,SVM and Ensemble classifiers to analyze the performance of the algorithm and verify its effectiveness.At the same time,in real application scenarios,verify the applicability of the algorithm.First,to address the limitations of existing algorithms in binary-classification online streaming feature selection,such as low classification accuracy,large number of selected features,and long running time with a large number of low redundancy and high correlation features,a technical framework and algorithm OSFIC for online streaming feature selection oriented to classification is proposed.According to the two-stage process of new arrival feature stage and analysis of candidate feature sets,The task of each stage is to filter irrelevant new features by null-conditional independence,filter redundant new features and redundant features in a candidate feature set by a single-conditional mutual information,and finally filter the remaining redundancy in the candidate feature set by multi-conditional independence.and apply the features selected by OSFIC algorithm and Alpha-investing,OSFS and SAOLA algorithms on 14 datasets to the selected classifiers,compare the experimental results from the aspects of classification accuracy,number of selected features and running time to verify the effectiveness of the algorithm.Secondly,to deal with the problem of low classification accuracy due to the imbalance of class distribution in multi-classification under streaming features,an algorithm for multi-classification online feature selection for streaming features,MCFS is proposed.The algorithm is based on the "one-against-all" strategy by transforming the "one multi-classification" problem into a "multiple binary-classification" problem.The MCFS algorithm counts the number of categories of the target label,and uses each category as a positive example,and the remaining categories as negative examples.The OSFIC algorithm is used for feature selection multiple times,and the final result is a set of remove the duplicate elements in each selected feature sets.the MCFS,Fast-OSFS and SAOLA algorithms are applied to 12 multi-classification datasets,and the algorithm results are classified by the selected classifiers.Through the comparison of classification accuracy and recall,the effectiveness of the algorithm is verified.Finally,the OSFIC and MCFS are applied to real protein profiles and gene expression scenarios respectively.On different selected classifiers,the changes in classification accuracy at different stages generated by streaming features are compared to verify the applicability of the algorithm.
Keywords/Search Tags:streaming features, online feature selection, binary-classification, multi-classification, approximate Markov blanket
PDF Full Text Request
Related items