Font Size: a A A

Research On Decision Tree Algorithm Based On Rough Set Theory

Posted on:2017-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z YinFull Text:PDF
GTID:2308330482479872Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Entering the "Internet+" period, the data people gained from production and life is showing explosive growth. Data has become an important strategic resource and how to mine and use these data has become a hot research in the field of data mining. For the characteristics of clear structures and proficiency decision tree algorithm is widely used. The decision table has the features of complexity, which affects the efficiency and accuracy of the classification algorithm a lot. In addition normal classic decision tree algorithm is difficult to satisfy data mining under large data environment. Therefore, this article will combine attribute reduction with decision tree algorithm which has profound theoretical research significance and value. The main contents of this paper is divided into the following sections.(1) Rough set theoryStarting from attribute reduction of candidate attribute, based on existing attribute reduction algorithm, it adds relevant attribute index into process which select candidate attribute incorporated into minimum reduction set. In the process of selecting a candidate attributes, it prefers to add maximum core to candidate reduction set and remove non-relevant attribute. Through this theory we can maximize the information of attribute reduction. As is shown through comparative experiments, this algorithm can effectively reduce the choice of candidate attributes blind and reduce time cost。(2) Decision tree algorithmAccording to the existing serial decision tree algorithm which cannot satisfy big data mining and the fact that parallel decision tree algorithm in parallel frame I/O cost is large, in this paper, we use a new data structure which simplifies the process of Map and Reduce process to reduce turnaround times nodes and I/O cost. Experiments show distributed parallel decision tree algorithm, obtain more excellent efficiency with ensuring the accuracy of the classification.
Keywords/Search Tags:Data Mining, Decision Tree, Attribute Reduction, Parallel and Distributed Framework
PDF Full Text Request
Related items