Font Size: a A A

Research And Application Of Decision Tree Algorithm Based On Attribute Correlation

Posted on:2018-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:T H DongFull Text:PDF
GTID:2348330515986932Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Challenges coexist with opportunities in the new century and it can be said that utilize and master of these massive data related to the future development of various fields.More in-depth exploration in the field of big data can be macroscopic analysis of data model,find out the potential rule and forecast the future trend,which can be more profound insight into effective,comprehensive information.The research on Data mining algorithms can be said to have both value of scientific research and practical application.On the basis of the classical decision tree C4.5 algorithm,this paper uses the algorithm of Apriori association rule to combine the correlation degree of data source attributes to the decision tree.When choosing the splitting attribute,the traditional algorithm of C4.5 only considers the correlation between the measured attributes and the class attributes,ignoring the correlation between the non-class attributes,and this degree of correlation determines the degree of redundancy between attributes.In order to reduce the impact of redundancy,the paper brought the idea of information gain to measure attributes and other non-class attributes to the original algorithm,and to generate more reliable attributes.In addition,aiming at the lack of information content of attributes through the process of constructing decision tree model,the paper used the algorithm of the Apriori to generate a series of strong rules in the same time.In order to expanding information and improving the accuracy of algorithm of C4.5,the paper according to the selection criteria of new attributes to filter out new attributes from these strong rules and add them to the original property set.The information contained in an instance is often varied and rich.Based on the traditional algorithm of decision tree,we can know the degree of relevance of attribute-classification.However,the degree of correlation between attributes is a transverse analysis of data sets,by analyzing the relationship between the two properties,we can make our analytical framework more stereoscopic and the results more availability.Finally,the method is applied to an example that using the historical data to find out the main factors that affecting the customer of gymnasium become a member,and then using relevant attributes to build model and forecast,in order to discover interested and high value customer groups,and further strengthens the practical value in the actual scene.
Keywords/Search Tags:Data Mining, Association Rule of Apriori, Algorithm of C4.5, Correlation Degree, Redundancy between Attributes
PDF Full Text Request
Related items