Font Size: a A A

The Research On Feature Selection Method Based On Maximal Consistent Block Neighborhood Rough Set

Posted on:2021-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ChengFull Text:PDF
GTID:2428330620463333Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,data from various fields shows a massive growth which means the era of big data has arrived.The high dimensionality of data features is one of the important characteristics of big data,which brings a severe challenge to data mining.Feature selection and feature extraction are two main approaches to data dimensionality reduction.Compared with feature extraction,feature selection can preserve the original semantic information of data,which is beneficial to the interpretation of data mining results.Rough set theory is a powerful tool for feature selection.The neighborhood rough set model is one of the important extensions for the classical rough set model,which is suitable for feature selection of high-dimensional data in distance space.Aiming at numerical data,this paper combines the concept of maximum consistent block with neighborhood rough set and establishes a novel neighborhood rough set model based on maximum consistent block.The main contents and conclusions of the paper are as follows:(1)A single-label feature selection method based on the maximum consistent block neighborhood rough setThe existing extension models about neighborhood rough set only focus on the consistent situation that all samples in the neighborhood are in a single decision class,which makes the information contained in boundary samples underused.Aiming at this limitation of neighborhood rough set mode,we combine the concept of maximum consistent block of a tolerance relation with neighborhood rough set model and select the largest equivalent block in the neighborhood of a sample as the minimum information granule,and establish a new model,called neighborhood rough set model based on maximal consistent block,which redefines some concepts,such as upper and lower approximations,significance of attribute and so on.This model can enlarge the positive region by transforming the boundary samples into consistent samples in smaller information granules.In addition,we design the corresponding feature selection algorithm by using the forward greedy strategy.The effectiveness of the proposed model is validated by the experiments on seven public UCI data sets.(2)A multi-label feature selection method based on the maximum consistent block neighborhood rough setUnlike single label data,each sample of multi-label data may be associated with a set of labels simultaneously.From the view of information granulation,the complexity of computing information granulation in decision space is high,which will generate more types of equivalence classes and fewer samples in a equivalence class when we use equivalence relation to obtain information granulation,and lead to poor performance for multi-label rough set model.Therefore,this paper gets the information granulation of samples in the decision space from the view of label.We establish a new model by redefining some concepts,such as upper and lower approximations,significance of attribute and so on.In addition,we propose a new feature selection algorithm by using the forward greedy strategy.The effectiveness of the proposed algorithm is validated by the experiments on five public mulan data sets.
Keywords/Search Tags:Neighborhood rough set, Maximum consistent blocks, Feature selection, Multi-label learning
PDF Full Text Request
Related items