Font Size: a A A

Attribute Reduction Based On Rough Set Theory And Research On Classification Algorithm Of Decision Tree

Posted on:2015-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:K ShiFull Text:PDF
GTID:2268330428481780Subject:Computer technology
Abstract/Summary:PDF Full Text Request
It is generally known that we are now in the age of big data, every day large amounts of data are produced in which a variety of valuable information is hidden. How to mine the useful information from plenty of data efficiently becomes one hot issue in the study of data mining technology. Among numerous studies decision tree algorithm has been widely adopt-ed in the classification field of data mining as its simple and efficient character. However as its redundancy and inconsistence in the results of data mining, it has negative influence on classification effectiveness and accuracy. Moreover, the frequently-used single-variable decision-making tree algorithm usually produces a very large scale in spanning tree. There-fore, this thesis puts forward the improved algorithm based on the combination of attribute reduction and decision tree algorithm in rough set theory. It has more theoretical research significance and practical application value. The main research of the thesis includes the following three parts:(1) The original algorithm of attribute reduction usually works on the whole data sets, in addition it takes the way of direct deleting when facing problem of incomparable data for partial attribute reduction algorithms which take simplified decision table. To overcome the weaknesses of these two sides, the paper comes up with improved simplified decision table algorithm which can not only delete redundant data but also reserve incomparable data. It is proved through comparative experiments of UCI data set that the number of original data can be reduced effectively, which provides more possibilities for high efficiency in attribute reduction algorithm as well as decision tree algorithm.(2) This thesis puts forward core attribute algorithm which is based on information entropy theory to overcome the weakness of core attribute algorithm based on discernibility matrix and of core attribute algorithm based on the definition of Algebraic. It can be proved through the core attribute the algorithm got, that there is no difference for consistent decision tables between the reduction under the definition of algebra and the reduction based on infor- mation entropy. However, what can only ensure during reduction of incomparable decision table is that U/IND(P) of comparable part will not change, while information entropy can make sure the U/IND(P) of the whole data set stay same. That is, the core attributes under the definition of algebra is a part of information entropy. Therefore on the basis of the core attribute the thesis comes up with complete attribute reduction algorithm which is based on different importance of attribute.(3) To solve the weakness of massive decision-making tree got in single-variable decision-making tree algorithm, this thesis proposes multivariable decision tree algorith-m, in addition it improves simplified decision tree by introducing degree of determinacy. What is more, better accuracy and tree scale than the other four algorithm have been proved by UCI data set. Lastly the improved algorithm in this thesis is inserted to attribute reduc-tion and decision tree system. By modularization it also helps in realizing the reduction and classification of data base.
Keywords/Search Tags:Rough Set, Simplified Decision Table, Incompatible Data, Attribute Re-duction, Multivariable Decision Tree
PDF Full Text Request
Related items