Font Size: a A A

Parallel Attribute Multi-reduction And Rules Extraction Based On Rough Set Theory

Posted on:2019-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z WuFull Text:PDF
GTID:2348330542489046Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of scientific technology and network technology,the relationship between data and data becomes more and more complicated.How to extract useful value and information from complexdata is the current problem.Rough set theory is an effective tool for data mining,which obtains the hidden knowledge by dividing the data set with certain rules instead of depending on some prior knowledge outside the data set.Attribute reduction is the key step for knowledge acquisition in rough set theory,which can get hidden rules by deleting redundant attributes and reducing the dimension of decision table.It is widely used in pattern recognition,machine learning and so on.However,the reduction result of traditional attribute reduction algorithm is single and it is still a challenging task to perfom attribute reduction on massive data in big data era.In this paper,a parallel attribute multi-reduction based on positive region is proposed.It overcomes the singleness of the reduction results,and uses MapReduce to deal with massive data.The key to improve the reduction efficiency is the effective computation of equivalence classes and attribute significance.Multiple reduction results are obtained by cycling instead of method with non core attributes.At the same time,the parallel random forest classifier in Mahout is used to calculate the classification accuracy of each reduction result,and the highest classification accuracy is selected to acquire knowledge in decision table.In the process of knowledge extraction,the paper employs CART decision tree model to get the final rules.Finally,the UCI data sets are used to verify the correctness of the algorithm.The small data sets are used to vertify the corteness of the reduction results,and the larger data sets are used to calculate the running time of the algorithm.The running time performed by other parallel algorithm is compared to vertify the effiency.In addition,some artificial data sets are designed to evaluate the parallel indexes of this algorithm such as speedup and scalability,which fully proves that the algorithm is suitable for dealing with massive data.
Keywords/Search Tags:Parallel Algorithm, Attribute Multi-reduction, Rough Set, Decision Tree, Random Forest
PDF Full Text Request
Related items