Font Size: a A A

Research On Classification Algorithm Of Decision Tree For Missing Data Based On Variable Precision Rough Set

Posted on:2014-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:L J GaoFull Text:PDF
GTID:2248330398952040Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining is widely used in various fields,and it speeds up the pace of exploring information hidden behind large amounts of data. Many people want to make an effective study of coronary heart disease with the help of data mining. And as a data analysis method in the data mining technology, decision tree classification algorithm has precise classification accuracy, intuitive decision results and higher generalization capability, so it becomes an ideal method of coronary heart disease. However, due to the missing value and the existence of noise data, the analysis results we got cannot be used in the actual work of diagnosing and treatment of coronary heart disease (CHD). Therefore, based on through-depth study on the exiting algorithms,this paper gives relevant improvements for deficiencies in algorithms of missing data processing and decision tree. The main contents in this paper is as follows:(1)The characteristics of Coronary Heart Disease (CHD) data are mostly discrete, while the existing KNN is just appropriate for dealing with continuous attributes, and the connection between the missing case is not fully considered. This paper proposes a new method which can deal with both discrete and continuous attributes, and it can fully use the degree of influence for the missing example from other examples. This method uses the grey correlation analysis theory in the grey system,then gets the K examples that are the most similar with the example which missing attribute values, then based on the information amount of the K examples,this paper uses the weighted average method to fill the missing values.Finally,the comparing experiment using UCI datasets shows that the proposed algorithm is better than other algorithms.(2) Almost all the data sets have some noise data, the data set of coronary heart disease has a great influence on decision classification. This paper proposes a variable precision rough set attribute selection criteria which is based on scaling function,the criteria is con-sidering both the standard properties of weighted approximation accuracy and the number of attribute values,and it improves the anti-interference ability for noise data and weakens the bias of attributes selection.then it improves the veracity of classification algorithm. At the same time, this paper imports inhibitory factor threshold,supportive and confidence degree during the process of tree pre-pruning, which simplifies the structure of the deci-sion tree. The results of contrastive experiment on standard UCI datasets shows that the improved algorithm is superior to other algorithms of decision tree.(3)The proposed filling algorithm and improved decision tree algorithm in this paper are used in modules of Clinic Heart Disease auxiliary systems,and this paper has realized the classification of TCM diagnosis syndromes on coronary heart disease datasets.
Keywords/Search Tags:Missing Attribute Values, Grey Relational Analysis, Scaling Function, Rough Set, Decision Tree
PDF Full Text Request
Related items