Font Size: a A A

Research On Discretization Methods For Quantitative Attributes

Posted on:2015-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhaoFull Text:PDF
GTID:2298330452494191Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With data mining, machine learning algorithms, and the wide application of patternrecognition technology, the discretization techniques for continuous attributes have madebreakthrough progress.As we all know, discretization algorithms have many categories, the decision tablediscretization based on rough set theory is one of the methods having better discretizationresults. Through deep analysis and research on the decision table discretization, aiming atthe existing defects and insufficiency in traditional algorithm, an improved decision tablediscretization algorithm is proposed. The new algorithm has two parts: one is the candidatecuts selection strategy; and the second is the final cuts selection strategy. Compared withthe traditional UACC algorithm, the new method called as SACC based on decisionattributes and attribute importance is proposed.In addition, the concept of breakpoint choiceprobability is put forward to replace the computation of the breakpoints importance.Theoretical analysis and experimental results show that the new strategy has produced gooddiscretization results, and effectively improve the prediction precision of the classifier.Now, most existing discretization algorithms focus on the single continuous attributes.However, this way often counts the discretization classification error as the only evaluationstandard. For this, a new based MDLP single attribute discretization strategy is put forward,and it contains full adjacent interval information. In addition, the adjacent intervalimportance is adopted to effectively capture the mutual relationship between attributes.Through research on single and multiple attribute discretization standards, the articlesummarizes a new adjacent interval combining evaluation method, and based on this newmethod a bottom-up discretization algorithm is proposed.Firstly, the algorithm evaluates all adjacent intervals for the whole data set, and then toensure the most reasonable merger range, intervals on single attribute have been evaluated.The experimental results show that the new algorithm has better discretization results andsignificantly improves the prediction accuracy of naive Bayesian and SVM classifier.
Keywords/Search Tags:continuous data discretization, rough set theory, attributes importance, breakpoint choice probability, the minimum description length principle
PDF Full Text Request
Related items