Font Size: a A A

Research On Classification Algorithm Of Decision Tree Based On Variable Precision Rough Set

Posted on:2012-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:W J YanFull Text:PDF
GTID:2178330335455526Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
After twenty years of development, data mining has become a key research topic in areas of machine learning and artificial intelligence since it first appeared in the late 80s. At present, feature extraction, attribute reduction, improvement of algorithm efficiency and classification accuracy and application of relevant methods in the specific areas are the focus of data mining. In many data mining methods, decision tree classification methods has advantages of light computation, easily understandable and intuitive results, so this method has been paid great attentions by many scholars. Therefore, based on through-depth study on the exiting algorithms, this paper makes corresponding improvements for deficiencies in algorithms of attribute reduction and decision tree classification. And the comparing experiments use the UCI datasets, and the improved algorithms achieve good results. Specifically, the main contents include the following three aspects:(1) During the attribute reduction, criteria of significant attribute selection in existing algorithms often tend to choose attribute which has more values, without considering the validity of it. Therefore, this paper proposes a new criterion of significant attribute selection, which integrates the number of valid values andĪ²-approximation accuracy as a measure of importance, using support to count the number of valid values.(2) As existing decision tree algorithm has disadvantages of being sensitive to noise data and difficult to select splitting property, this paper proposes a new attribute selection criterion which combines with variable precision's advantage of tolerating noise data. This criterion considers comprehensively both from the perspectives of variable precision explicit region and information theory. It makes the improved algorithm have a high resistance to noise data and classification accuracy. In addition, through the introduction of confidence and support, the algorithm achieves pre-pruning in the process of decision tree building, which can reduce the size of decision tree. (3) Uses the coronary heart disease data in traditional Chinese medicine treatment, which go through attribute extraction and data preprocessing steps, as experimental data. First, by means of attribute reduction, the factors that affect coronary heart disease are all dug out; then, uses the reduced data to construct decision tree and extract decision rules.
Keywords/Search Tags:Variable Precision Rough Set, Attribute Reduction, Decision Tree, Coronary Heart Disease
PDF Full Text Request
Related items