Font Size: a A A

Research On Information-theoretical-based Multi-label Feature Selection Approach

Posted on:2022-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:L B GaoFull Text:PDF
GTID:2518306758491484Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,enormous multi-label data has caused the "curse of dimensionality",which has sparked widespread international attention.In terms of high-dimensional data,data classification can be more easily handled by means of dimensionality reduction,and feature selection plays an indispensable role in data dimensionality reduction.Feature selection keeps the physical meanings of the original features while selecting a subset of them from the original feature set.It reduces the storage space required for data while improving the classification efficiency and prediction accuracy of the algorithm.At present,a large number of information-theoretical-based multi-label feature selection approaches based on Filter model have been proposed.In this paper,we also focus on information theory to propose two multi-label feature selection approaches.1.We consider combining three key aspects that can affect feature relevance:candidate features,selected features,and label correlations.However,traditional multi-label feature selection approaches do not consider these three key aspects comprehensively enough.To evaluate feature relevance,a thorough examination of these three key aspects is more conducive to capturing the optimal features.Therefore,a novel feature relevance term FR is designed,which adopts three incremental information terms to represent three types of conditional relevance,thereby comprehensively considering three key aspects that affect feature relevance.Furthermore,we utilize label-related feature redundancy as a novel feature redundancy term LR to decrease redundancy as possible.To sum up,Feature Selection combining Three types of Conditional Relevance(TCRFS)is proposed.Extensive experiments show that TCRFS achieves superior classification performance on 13 multi-label benchmark data sets from four domains.2.In the past multi-label feature selection approaches based on information theory,feature relevance is generally evaluated according to the amount of information provided to the label set by the selected features or candidate features.Although it is vital to consider the informativeness,they underestimate the importance of the changed ratio of informativeness in evaluating feature relevance.To this end,we evaluate feature relevance based on the changed ratio of undetermined informativeness and the changed ratio of established informativeness,and design a new feature relevance term RW.Based on RW,Relevance based on Weight Feature Selection(RWFS)is proposed.To verify the classification effectiveness of RWFS,it is compared with eight state-of-the-art multi-label feature selection approaches on 13real-world data sets.The experimental results show that RWFS obtains the best classification results,that is,considering the two types of changed ratio to evaluate feature relevance can effectively improve the classification performance of the approach.In this paper,we focus on the research on information-theoretical-based multi-label feature selection approach.By innovating the feature selection strategy for different issues existing in the existing approaches,and designing a multi-label feature selection approach based on this,a better classification effect is obtained than the existing approaches.
Keywords/Search Tags:Feature Selection, Information Theory, Conditional Relevance, the Changed Ratio of Informativeness, Relevance based on Weight
PDF Full Text Request
Related items