Font Size: a A A

Feature Selection For Data With A Hierarchical Structure

Posted on:2020-01-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:H ZhaoFull Text:PDF
GTID:1488306131467654Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In real machine learning tasks,feature selection is a very indispensable data preprocessing process,because it can benefit the machine learning tasks such as classification by speeding up the learning process,and improving the model generalization capability.In recent years,although existing approaches of feature selection have made great progress,most of these researches focus on flat data,which means that all data are taken as a whole.However,the era of big data has brought about not only the rapid growth of the number of samples,feature dimensions and categories,but also the structural relationship among data information,such as hierarchical structure.It is an important challenge for machine learning and data mining to make full use of the hierarchical information of big data to select features.In this thesis,we focus on feature selection for data with a hierarchical structure.Specifically,there are three methods as follows:(1)Feature selection based on adaptive neighborhood granularity for hierarchical featureMost neighborhood rough set models only consider the fixed neighborhood granularity.To this end,we design an adaptive neighborhood rough set model based on the3? rule of statistics,which is adaptive to data precision that is described by the hierarchical confidence of the feature subsets.Moreover,we develop a fast backtracking algorithm for the adaptive neighborhood rough sets based feature selection by considering the trade-off between test costs and misclassification costs.(2)Fuzzy rough set based feature selection with hierarchical classificationThe main limitation of existing feature selection methods based on fuzzy rough set is that they ignore the hierarchical structure of class.To this end,we design a feature selection strategy with hierarchical classification based on fuzzy rough sets.A fuzzy rough set model with hierarchical structures is developed to compute the lower and upper approximations of classes organized with a class hierarchy.The model considers the inclusive relationship and the sibling relationship of the hierarchical structure of classes,and designs the corresponding feature selection algorithm.It can effectively reduce the search space of different samples in the classical fuzzy rough set model.(3)Recursive regularization based feature selection for hierarchical classificationMost feature selection methods ignore the semantic hyponymy in the directory of hierarchical classes,and select a uniform subset of the features for all classes.To this end,we propose a new feature selection framework with recursive regularization for hierarchical classification using divide-and-conquer method.This framework takes the hierarchical information of the class structure into account.It uses parent-child,sibling,and family relationships for hierarchical regularization to learn a sparse matrix for the feature ranking at each sub-classification task.The proposed three methods verify that mining data hierarchy is an effective method to improve the effect of feature selection in big data classification tasks.
Keywords/Search Tags:Feature selection, fuzzy rough set, hierarchical structure, neighborhood rough set, regularization
PDF Full Text Request
Related items