| As an indispensable data pre-processing method,feature selection pursues to remove irrelevant or redundant features from the original feature set while retaining learning effectiveness.However,as data development enters a new era,data presents the characteristics of high dimension of features,complexity of data structure and diversity of data label,which bring new problems and challenges to feature selection:(1)Rapidly growing data faces the curse of dimensionality both in feature space and tag space.(2)In multi-label learning,the information of correlation between labels is ignored,resulting in poor classification performance.(3)In label learning distribution,how to deal with the quantified category labels for feature selection.Therefore,in order to solve the above problems,this thesis proposes two new neighborhood rough set feature selection algorithms based on neighborhood rough sets,modeled for data types with multi-label learning and label distribution learning.The main research contents are as follows.(1)Feature selection algorithm with label correlation based on neighborhood rough set in multi-label environment.By introducing label correlation and maximum nearest distance,this thesis proposes an effective algorithm NRS-LC(Neighborhood rough set based multi-label feature selection with label correlation).The algorithm not only can use label correlation to reduce the computational complexity of the neighborhood rough set,but also can change the neighborhood range adaptively.The experimental results demonstrate the performance of NRS-LC for multi-label learning.The problems for the curse of dimensionality and multi-label neighborhood rough sets that do not exploit the correlation between labels are solved by this algorithm.(2)Feature selection algorithm based on neighborhood rough set in label distribution environment.In order to address the problem that the existing neighborhood rough set cannot be effectively applied to the label distribution field,this thesis based on the neighborhood rough set proposes the feature selection algorithm NRS-LD(Neighborhood rough set with label distribution).The algorithm combines the neighborhood rough set into the label space to form a neighborhood rough set in label space and combine it with the sample feature space information to feature select.What is more,the significance of the labels is also introduced to make the feature subset more representative.The experimental results demonstrate the effectiveness of NRS-LD for learning the marker distribution.The problems for the curse of dimensionality and the neighborhood rough sets that can’t use in label distribution are solved by this algorithm.In this thesis,the above researches are tested and analyzed on publicly available multi-label datasets and labeled distribution datasets.The experimental results indicate that the neighborhood rough set model with labeled correlation proposed in this thesis can effectively obtain feature subsets;The neighborhood rough set with label distribution model proposed in this thesis can handle well on label distribution data and also obtain better performance on different classifiers. |