Font Size: a A A

Study And Application On Prediction Methods Based On Association Rules&Decision Tree

Posted on:2013-09-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:W G YiFull Text:PDF
GTID:1228330395454860Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Mining association rule and decision tree is a hot research topic in various fields like machine learning, artificial intelligence, data mining, and has been applied to business decision, the laws of diagnose and drug using, etc. But it faced many challenges such as the shortage of special dataset based expanded research, and hard-to-improved precision. This paper analyzes association rule mining and decision tree algorithm, especially for expanded research about association rule mining and decision tree classifier, which are numbers of rules, data-validity, long itemsets with low support, attribute selection criterion and constructing decision tree algorithms with multi-valued and multi-labeled data, are studied in depth. Some improved approaches are proposed as follows.(1) We analyze the meaning of parameters under this model: support-confidence-interest and designed a variety of equations between the number of rules and parameters by using regression method.we use Multiple Correlation Coefficients to test the fitting effects of equation and use significance test to verify whether the coefficients of parameters are significantly zero. The regression equation which has a larger Multiple Correlation Coefficient will be as the optimal equation fitted. Through the selected optimal equation, we can predict the number of rules under the given parameters, while optimizing the choice of three parameters and determining the range of parameters.(2) A new association rule mining framework is proposed:fuzzy decreasing support-confidence that finds all itemsets that satisfy a length-decreasing support constraint. On this basis, by analyzing the correlation between the antecedent and the consequent of the generated rules, we further propose three correction frameworks:1) Fuzzy Decreasing Support, Confidence, Interestingness;2) Fuzzy Decreasing Support, Bidirectional Confidence, Interestingness;3) Fuzzy Decreasing Support, Coincidence, Interestingness. We extract data about the relevant factors of Syndrome Differentiation and the patients’ medication from the coronary heart disease data collected from the hospitals. The experimental results show that the frameworks proposed in this paper not only verify the existing Syndrome Differentiation and regular patterns of medication, but also discover Syndrome Differentiation with a combination of factors and medicine compatibilities among multiple drugs.(3) This paper analyzes the existing decision tree classification algorithms, and proposes two new attribute selection methods. The first method MVPRSDT:When the algorithm selects a new attribute, not only the number of attribute values in the current node, but also the size of variable precision explicit region in the lower node is taken into consideration. In other words, the size of variable precision explicit region of attributes in two levels of the decision tree is used. Through the new approach to selection of attributes, the algorithm overcome the lack of ID3algorithm and also has the advantages of variable precision rough set. The second method IVPRSDT:This algorithm uses a new standard of attribute selection which considers comprehensively the classification accuracy and number of attribute values, that is, weighted roughness and complexity. At the same time support and confidence are introduced in the conditions of the corresponding node to stop splitting, and they can improve the algorithm’s generalization ability. To reduce the impact of noise data and missing values, IVPRSDT uses the label predicted method based on match. The comparing experiments on some data sets from the UCI Machine Learning Repository prove the effect.(4) We present three new decision tree algorithms for multi-valued and multi-labeled data. In these algorithms, three new measuring formulas calculating the similarity between two label-sets in the child nodes are firstly proposed. They comprehensively consider both the condition which the elements appear or not appear in the two label-sets at the same time as well as the boundary condition, and make the similarity calculations in the label-sets more comprehensive and accurate. Moreover, we propose the new conditions of the corresponding node to stop splitting as well as the corresponding prediction method. The experiment compared with the existing algorithms proves that these algorithms have the higher accuracy, and are more suitable for dealing with multi-valued and multi-labeled data.
Keywords/Search Tags:Association rule, Decision tree, Fuzzy decreasing support, Variableprecision rough set, Multi-labeled data
PDF Full Text Request
Related items