Font Size: a A A

Study On Comparison Of Discretization Algorithms Of Continuous Attributes

Posted on:2008-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:N JiaoFull Text:PDF
GTID:2178360215951389Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
discretization of continuous attributes is an important issue in data preprocessing, which plays an important role in data mining, machine learning and other domains. Many researchers propose a lot of methods about the discretization problem, which have been developed along different lines due to different needs. For example, data discretization methods can be classified into supervised and unsupervised methods depending on whether the data processed has class information. Among these methods, distinct methods will get different results when the data structure is different. Unfortunately, these methods are all not universal and may get better results on some data and worse ones on others. Therefore, the study on the comparison of discretization algorithm can give advices to select effective algorithm.Firstly, the assignment task and aim of discretization is introduced, the problem and essence of discretization are described and the discretization methods are classed from different views in this paper. Secondly, a new hierarchical framework is proposed for discretization methods. In this framework, the discretization methods are classed single variable and multivariable discretization methods first, then the different discretization measures are classed splitting and merging, at last, a method is classed the supervised and unsupervised. Thirdly, the process of single variable splitting and merging discretization methods and multivariable splitting and merging discretization methods are proposed. And then some discretization algorithms are analysed and the cut-pionts of standard data is given. Fourthly, the paper selects some single variable and multivariable discretization methods to do comparison experiments for experiment comparing and analyzing. The experiment comparing and analyzing are classed comparing and analyzing of single variable discretization methods, comparing and analyzing of multivariable discretization methods and integrated comparing and analyzing of single variable and multivariable discretization methods, and this paper give an improved discretization algorithm. At last, the paper introduces the data mining floor platform we designed which based on rough set.
Keywords/Search Tags:discretization, greedy, algorithm, significance of attributes, entropy of information, clustering, MDLP, dependency, binning
PDF Full Text Request
Related items