With the rapid development of Internet technology,large-scale data to be processed is produced in all fields.How to process and analyze these data has become a hot research topic nowadays.Rough set theory provides an effective means for data mining to deal with redundant data.However,numerical attributes must be discretized before processing.Fuzzy rough set can directly process numerical attributes,reduce the loss of data information caused by data dispersion,and avoid the advantages of data structure destruction.It can effectively process data and has been successfully applied in data mining,medical diagnosis and other fields.In order to reduce the influence of sample distribution and noise-like data on the classification model based on fuzzy rough set,improved fuzzy rough set model are proposed in this dissertation.In addition,parallel attribute reduction method in the multi-kernel granulated fuzzy rough set model are presented.The main research work and innovation of this dissertation are as follows:1.In the fuzzy rough set,the uncertainty of sample distribution will affect the approximation set of the object,thus affecting the acquisition of effective attribute reduction.In order to define the approximate set effectively,a fuzzy rough set based on distance ratio scale is proposed.This model introduces the definition of the sample set based on the distance ratio scale,and avoids the influence of the uncertainty of sample distribution on the approximate set by controlling the distance ratio scale.The basic properties of the model are given,a new dependency function is defined,and an attribute reduction algorithm is designed.SVM,NaiveBayes and J48 are used as test classifiers to evaluate the performance of the proposed algorithm on the UCI data set.Experimental results show that the attribute reduction algorithm proposed in this paper can effectively obtain the reduction and improve the accuracy of classification.(chapter 3)2.The classical fuzzy rough set model is extremely sensitive to noise-like data,which limits its practical application.In order to reduce the influence of noise data on the model,a novel robust fuzzy rough set attribute reduction method based on INS algorithm is proposed.INS algorithm is a robust outlier detection algorithm,which can effectively identify abnormal samples.In this paper,INS algorithm is firstly improved to make it applicable to multi-class standard data,and a constraint condition is established to improve the accuracy of selected noise data.Finally,attribute reduction is carried out by combining existing fuzzy rough set model.Experimental results demonstrate the effectiveness of the proposed method.(chapter 4)3.Compared with single-kernel learning model,multi-core learning model has stronger flexibility and adaptability,and can achieve better performance by replacing single-kernel with multi-kernel.The parallel attribute reduction algorithm of multi-kernel granulated fuzzy rough set model is realized by using MapReduce model.The parallel algorithm is tested on the UCI data set and the experimental results are analyzed accordingly.The experimental results show that the algorithm can reduce effectively under the condition of big data and has good parallel performance.(chapter 5)... |