Font Size: a A A

Researches On Attribute Reduction Algorithms In Fuzzy Rough Set Under Framework Of Mapreduce

Posted on:2018-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:D J LiuFull Text:PDF
GTID:2348330521450737Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, the development speed of Internet is extremely fast. More and more data needs to be calculated and analyzed. How to extract useful information from a huge mount of data attracts widely attention and knowledge discovery becomes an important research topic.Attribute reduction (feature selection) is an important way to acquire knowledge and eliminate noise. In a data set or knowledge base, there are lots of different attributes, which have different degrees of importance. Some of them are necessary for people to make decision, and others may be irrelevant, redundant or useless. A lot of time and space is spent on dealing with those useless information when people acquiring knowledge form data sets. The work of attribute reduction is to remove those useless information from the data sets. Consequently, knowledge discovery can be more quick and easier after attribute reduction. Attribute reduction is one of the most important application of rough set theory, which have received comprehensive study. Classical rough set theory can't deal with numerical data directly. The data need to be discretized when using classical rough set theory, but the information may loss in course of discretization. Consequently the effectiveness of knowledge acquisition may be affected.To overcome this problem, fuzzy rough set theory is proposed. It can handle numerical data directly without data discretization. But some shortcomings of attribute dependence based attribute reduction under fuzzy rough set also exist. In this thesis, particle swarm optimization is combined with fuzzy rough set theory. Furthermore, parallel computing under framework of MapReduce is employed to develop attribute reduction algorithm for processing large scale data. A series of researches have been carried out in fuzzy rough set theory and robust fuzzy rough set theory. The major work of this thesis is presented as below:1. Attribute reduction algorithm based on particle swarm optimization for gaussian kernel based fuzzy rough sets is presented (PSO-GKFRA). Because of the characteristics of gaussian kernel fuzzy rough set, traditional heuristic attribute dependence based attribute reduction algorithm may not be able to acquire the best combination of attributes. Even in some cases the reduction can not be obtained. The defects is overcome by combining the particle swarm optimization with gaussian kernel based fuzzy rough set in attribute reduction. Moreover different attribute reducts can be obtained so as to satisfy the needs of classification, when different parameters are selected. Experiments are performed on UCI public data sets, which proved that the proposed algorithm has good performance in reduction. (Chapter 3)2. The principles of computing approximation sets and attribute dependence of fuzzy rough set in gaussian kernel based fuzy rough set model are analyzed firstly. Then the paral-lel computing algorithm of approximations and attribute dependence in gaussian kernel based fuzzy rough sets is proposed by using the framework of MapReduce. The charac-teristics of MapReduce is employed,so the minimum distance between a selected object and the different classes objects can be calculated in the Map procedure. In this way, the relation between every two objects is not outputted to the HDFS. The strategy reduces the frequency of disk access, and the computation time is saved. So the attribute dependence in fuzzy rough set can be calculated in a large scale data. This algorithm is combined with PSO-GKFRA,and parallel attribute reduction algorithm based on particle swar-m optimization for gaussian kernel based fuzzy rough sets is presented. Experiments on UCI public data sets and artificial data sets are conducted, which showed that this algorithm has a good parallel performance and is effective in reduction. (Chapter 4)3. Based on robust fuzzy rough set model and framework of MapReduce, gaussian kernel based robust fuzzy rough set parallel attribute reduction algorithm is proposed. In this algorithm, k-nearest neighbours for a certain object are found in a data split. After paral-lel computing for every data split, k-nearest neighbours of each object in the whole data set are found. Then the lower approximation can be calculated by using the RNN oper-ator. Further more, all attribute dependence of the candidate reductions are computed at the same time in the proposed algorithm. Thus the time overhead of resource scheduling caused by Hadoop can be reduced. Experiments on UCI public data sets are carried out.The performance of the algorithm is analyzed when choosing different parameters of RNN operator. The experimental results verify parallel performance of the algorithm and show that it can handle noise data effectively. (Chapter 5)...
Keywords/Search Tags:Fuzzy rough set, Attribute reduction, Particle swarm optimization, MapReduce
PDF Full Text Request
Related items