Font Size: a A A

An Applied Research On Mapreduce In Rough Set Parallel Attribute Reduction

Posted on:2017-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:T C LiFull Text:PDF
GTID:2348330512968175Subject:Engineering
Abstract/Summary:PDF Full Text Request
Rough set theory is a mathematical theory which can deal with uncertain and incomplete information.It is often used in pattern recognition,machine learning and other fields.Attribute reduction is an important research direction of rough set theory and its significance lies in deleting redundant attributes and data mining.But in the current technology industries,we need to deal with big scale data sets.It is necessary to simplify the data sets through attribute reduction,but it becomes extremely difficult.To solve this problem,this study presents a novel parallel programming framework based on MapReduce to reduce attributes in rough set.MapReduce is developed intently to provide the parallel computing programming model for large-scale data,which could be distributed to multiple nodes in the networks to realize the parallel computing.This thesis proposes a parallel rough set attribute multi-reduction algorithm based on binary discernibility matrix,which is derived from discernibility matrix.Our reduction algorithm is more intuitive,parallelism is high,suitable for parallel computing.Although the space complexity of the matrix is high,it can just be compensated by the parallel computational model.Compared to the traditional attribute reduction algorithm,the decision table is not compatible Equivalence class,the "simplified decision table" in our algorithm,to equivalence classes rather than objects as a unit to generate binary discernibility matrix.The matrix row number is less and it is very helpful to reduce the space and time complexity.We combine MapReduce with the radix sort algorithm witch has lower complexity when dividing equivalence classes,and implement it in the Shuffle process.At the same time,in order to provide a more comprehensive and extensive knowledge and rules,we further put forward a parallel multi-reduct algorithm.In this thesis,we completed several experiments to evaluate the proposed algorithms through UCI data sets and multiple custom random data sets.The reduction results illustrated the algorithms were correct.We also recorded the operation time,and further analysis about the degree of parallelism,speedup,scalability and performance.Our algorithm has the perfect performance and the good scalability,especially for big datasets.
Keywords/Search Tags:MapReduce, Rough set, Attribute reduction, Parallel algorithm
PDF Full Text Request
Related items