Font Size: a A A

Research On Feature Selection Based On F-neighborhood Rough Sets

Posted on:2021-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z X DengFull Text:PDF
GTID:2428330611490820Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,more and more features are obtained in production practice.Some attributes may be redundant or unrelated to the classification task,and they need to be deleted before further data processing.Feature selection(also known as attribute reduction)is a technique used to reduce features in order to find the best subset of features to predict the sample category.Whether for single-label data or multi-label data,there is a key problem in the feature selection process: feature evaluation.For multi-label data,existing works are not enough to consider the relationship among labels,which has negative impact on the performance of multi-label feature selection and the effect of multi-label learning.To solve the above problems,this thesis combines the advantages of neighborhood rough sets and F-rough sets,and proposes a new rough set model,called F-neighborhood rough set.Then,F-neighborhood rough sets are applied to single-label feature selection and multi-label feature selection.The main research contents are listed as follows:First,combined the advantages of neighborhood rough sets and F-rough sets,F-neighborhood rough sets are proposed.The neighborhood relationship in F-neighborhood rough sets is defined,and neighborhood decision subsystems are used to represent different situations.Their properties are discussed.Further,F-attribute dependency and attribute significance matrices are constructed for feature evaluation.A feature selection algorithm is designed,which is based on two evaluation criteria.Compared with the state-of-the-art algorithms,experimental results show that our algorithm has great advantages.Secondly,the F-neighborhood rough set model is extended from single-label learningto multi-label learning.F-neighborhood rough sets decompose multi-label data into a family of single-label decision tables.Then the attribute dependency of the family of single-label is constructed for information fusion,in which the relationship among labels are fully considered.Multi-label feature selection is performed with attribute dependency of multiple decision tables and a matrix of attribute significance.Compared with the state-of-the-art algorithms,experimental results show that our algorithm has great advantages in both text and image multi-label learning tasks.The main innovation points of this thesis are listed as follows:(1)F-neighborhood rough set model is proposed.This model has the advantages of both neighborhood rough sets and F-rough sets.(2)A feature selection algorithm(NPRMS)based on the attribute significance matrix is proposed.NPRMS is not only suitable for discrete data,but also for continuous data.It is not only suitable for static data,but also for dynamic data.NPRMS has good robustness.(3)Under the situation of multi-label data learning,a feature selection algorithm(FNPRMS)based on attribute significance matrices is proposed.FNPRMS inherits the advantages of NPRMS,and fully considers the relationship among labels.It does not need to perform spatial conversion and has good comprehensibility.
Keywords/Search Tags:Feature selection, neighborhood rough sets, F-rough sets, multi-label learning, attribute significance matrix
PDF Full Text Request
Related items