Font Size: a A A

Research On Online Multi-Label Streaming Feature Selection With Label Correlation Method

Posted on:2023-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2568306848467474Subject:Engineering
Abstract/Summary:PDF Full Text Request
Feature selection,as an effective method of data dimension reduction,has been receiving much attention in the field of big data.However,traditional feature selection methods cannot deal with the real scenario where the feature space is unknown and features arrive in the form of stream over time effectively.Moreover,they focus on solving the problem of single label and ignore the correlation between labels in multi-label data.Therefore,how to deal with multi-label streaming feature selection problem under the premise of feature space dynamic change and how to use the correlation between labels to mine more valuable features are the key issues that need to be solved urgently.Based on the above issues,the main researches are as follows.Firstly,aiming at the difference of the importance of labels to feature selection,the LWC algorithm is proposed to calculate the weight of labels by using the correlation between labels.The algorithm calculates the weight of each label by constructing a weighted undirected graph based on the analysis of correlation between labels to ensure that the important labels provide more contributions.Different from the high-order strategy,LWC uses mutual information to calculate the correlation between labels instead of complex instance calculation to reduce time complexity.Secondly,an online multi-label streaming feature selection with label correlation algorithm OMSFSLC is established based on label correlation and mutual information theory.Based on the analysis of label correlation and label weight,the processing of online streaming feature includes three stages,significance analysis,relevance analysis,and redundancy analysis.The most valuable features are mined by determining whether the new inflow features improve the mean correlation degree between features and labels.Feature’s relevance and redundancy analysis ensure that irrelevant and redundant features are filtered out.Thirdly,the performance of OMSFSLC algorithm is evaluated by comparing with the benchmark algorithms.Comparing OMSFSLC with online multi-label streaming feature selection,online multi-label group feature selection and offline multi-label feature selection methods on seven different evaluation metrics of three different classifiers.In order to verify the effectiveness of OMSFSLC algorithm,comparative experiments were carried out from the aspects of classification accuracy,stability,number of selected features,and running time.Finally,OMSFSLC algorithm and online multi-label streaming feature selection methods OM-NRS and MSFS are apply to protein subcellular localization scenario,respectively,and comparative experiments are conducted to verify the applicability and effectiveness of the algorithm in real application scenario.
Keywords/Search Tags:multi-label, feature selection, streaming features, label correlation, label weight
PDF Full Text Request
Related items