Font Size: a A A

Research On Feature Selection With Fuzzy Rough Sets

Posted on:2020-04-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y W LiFull Text:PDF
GTID:1488305720975449Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
In the current era of big data,data has the characteristics of large-scale in size,mixed in structure,non-uniqueness in labels and high in dimension.It makes the rapid,timely and accurate data mining tasks face great challenges.Therefore,how to select the features from these data effectively has become one of the hot topics in the field of machine learning.The purpose of feature selection is to remove a large number of irrelevant and redundant features from the original feature set on the premise of ensuring learning performance.Feature subset is selected that contains all or most of the classification information of the original feature space to reduce the impact of "the curse of dimensionality" and improve learning performance.Fuzzy rough set theory is not only an objective and effective mathematical tool to deal with incomplete and uncertain information,but also a powerful and effective computing paradigm to realize feature selection.Therefore,this paper respectively constructs robust fuzzy rough set models and multi-label fuzzy rough set models to broaden the application range,for single-label data and multi-label data,on the basis of the study of feature selection based on fuzzy rough sets.These proposed models are to solve the sensibility of existing fuzzy rough set against noise information and expand the research of fuzzy rough set theory.The main innovative work includes the follow aspects:1.In view of the problem that the classical fuzzy rough set model is extremely sensitive to noise information in feature selection,this paper directly identifies the noise sample by defining the different classes' ratio of the sample.Then,we propose an effective robust fuzzy rough set model,called the Different Classes' ratio Fuzzy Rough Set(DC_ratio FRS)model.This model can not only reduce the influence of noise samples on the upper and lower approximation of the model,but also realize the robustness of the model by ignoring the noise samples.The related properties of DC_ratio FRS model are discussed.The sample pair selection(SPS)algorithm based on DC_ratio FRS model is used for feature selection.2.Existing fuzzy rough set models all consider that the decision attribute divides the sample set into several crisp decision classes to perform feature selection,so they are sensitive to noise information.A robust fuzzy rough set model with representative sample(RS-FRS)is proposed in this paper to solve this problem.By defining the fuzzy membership of the sample,the fuzziness and uncertainty of the membership of the sample can be embodied.According to this definition,we construct of RS-FRS model,which can reduce the influence of the noise samples.The proposed model considers the fuzziness of the sample membership degree,and it can more precisely approximate other subsets of the domain space with the fuzzy equivalent approximation space.RS-FRS model does not need to set parameters for the model in advance,which can effectively reduce the model complexity and human intervention.On this basis,the related properties of RS-FRS model are studied,and a structured feature selection algorithm based on RS-FRS with sample pair selection is designed.3.Aiming at the problem that existing multi-label feature selection algorithms ignore the intrinsic correlation between feature space and lable space,this paper combines fuzzy rough set with multiple kernel learning.By the extraction the kernel information from feature space and label space,the kernel fusion space is constructed for multi-label feature selection.Then,a multi-label kernel fuzzy rough set model,called RMFRS,is built.Meanwhile,we discuss its properties and give theoretical analysis.Based on RMFRS model,we design a feature selection algorithm for multi-label kernelized fuzzy rough set model to realize feature selection by evaluating the importance of features.4.Aiming at the problem that the existing multi-label feature selection algorithms ignore the label correlations,we propose an improved multi-label feature selection method based on fuzzy rough sets with both global and local label correlations(MFFLC)to mine the correlation among labels in multi-label data.By obtaining the global and local label correlations,we construct a weight matrix making use of the label information,which combines inherent weights of labels with joint weights of labels.Then,a multi-label fuzzy rough sets model with label correlations is built based on this matrix.Accordingly,a forward greedy feature selection algorithm is designed to identify and select the most relevant features.This paper conducts experiments and analyses for above mentioned four research topics on the open single-label data sets and multi-label data sets.The experimental results demonstrate that the robust fuzzy rough set models proposed in this paper can effectively select the most relevant features from the single-label data and have the robustness against noise information.The multi-label fuzzy rough set models proposed in this paper have good applicability for multi-label data processing and can effectively improve the performance of multi-label feature selection.This paper solves the problem of noise information sensitivity of existing fuzzy rough sets,fills the blank of integration between fuzzy rough set theory research and multi-label feature selection,and expands the research on the theory and application of fuzzy rough sets.
Keywords/Search Tags:Fuzzy Rough Set, Feature Selection, Multi-label Learning, Label Correlation, Multiple Kernel Learning
PDF Full Text Request
Related items