Parallel Attribute Multi-reduction And Rules Extraction Based On Rough Set Theory

Posted on:2019-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:Z Wu

Full Text:PDF

GTID:2348330542489046

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of scientific technology and network technology,the relationship between data and data becomes more and more complicated.How to extract useful value and information from complexdata is the current problem.Rough set theory is an effective tool for data mining,which obtains the hidden knowledge by dividing the data set with certain rules instead of depending on some prior knowledge outside the data set.Attribute reduction is the key step for knowledge acquisition in rough set theory,which can get hidden rules by deleting redundant attributes and reducing the dimension of decision table.It is widely used in pattern recognition,machine learning and so on.However,the reduction result of traditional attribute reduction algorithm is single and it is still a challenging task to perfom attribute reduction on massive data in big data era.In this paper,a parallel attribute multi-reduction based on positive region is proposed.It overcomes the singleness of the reduction results,and uses MapReduce to deal with massive data.The key to improve the reduction efficiency is the effective computation of equivalence classes and attribute significance.Multiple reduction results are obtained by cycling instead of method with non core attributes.At the same time,the parallel random forest classifier in Mahout is used to calculate the classification accuracy of each reduction result,and the highest classification accuracy is selected to acquire knowledge in decision table.In the process of knowledge extraction,the paper employs CART decision tree model to get the final rules.Finally,the UCI data sets are used to verify the correctness of the algorithm.The small data sets are used to vertify the corteness of the reduction results,and the larger data sets are used to calculate the running time of the algorithm.The running time performed by other parallel algorithm is compared to vertify the effiency.In addition,some artificial data sets are designed to evaluate the parallel indexes of this algorithm such as speedup and scalability,which fully proves that the algorithm is suitable for dealing with massive data.

Keywords/Search Tags:

Parallel Algorithm, Attribute Multi-reduction, Rough Set, Decision Tree, Random Forest

PDF Full Text Request

Related items

1	The Research On Optimization Of Random Forest Algorithm Based On Rough Set
2	Research On Decision Forest Algorithm Based On Attribute Reduction
3	Attribute Reduction Based On Rough Set Theory And Research On Classification Algorithm Of Decision Tree
4	Research On Decision Tree Algorithm Based On Rough Set Theory
5	Attribute Reduction Algorithm Of Neighborhood Rough Sets And Its Application In Classifier
6	Random Forest Based On Attributes Combination
7	Research On Decision Tree Algorithm Based On Rough Sets And Ensemble Learning
8	Attribute Reduction Of Decision Rough Set Based On Local Idea
9	Fuzzy Rough Set Attribute Reduction And Multi-fuzzy Decision Information Fusion
10	Study And Application Of Attribute Reduction Algorithms Based On Rough Sets