Font Size: a A A

Study On The Discrtization Of Continuous Attributes

Posted on:2007-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:X QueFull Text:PDF
GTID:2178360182986606Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the real databases, the data records are composed of many attributes with continuous value, since most of the existing method of data mining are capable of dealing with the discrete attributes only, it is necessary to discretize the continuous attributes firstly. Due to the above-mentioned fact, the study of the method for continuous attribute discretization becomes an important fundamental work to the research area of data mining, which can give a deep influence on the result of data mining process. Many kinds of method for the continuous attribute discretization have been proposed, every one has its properties and priorities. In this thesis, some concepts of rough sets and information entropy are combined to study the discretization of continuous attributes, and a new efficient discretization method is firstly proposed.The main contents of this thesis are list as follow:(1)The basic theory of data mining, information system and information entropy is reviewed in detail. A full introduction is given to the decision table, a very important concept to describe the rough sets. Also, some descriptions are briefly revealed on the history and development of the information theory, and the information entropy.(2)The relative results from the research of continuous attribute discretization are systematically analyzed and fully compared each other.(3) A useful concept, interval class information entropy, is proposed in the present thesis. Combined with the relative theory of rough sets, an efficient method for continuous attribute discretization based on interval class information entropy (DICE) is drawn out. Then, theoretical analysis is performed on the DICE method.(4) The DICE method, proposed in the present thesis, is successfully applied to discretized the real available databases. Based on the applications to deal with the real databases, some comparisons are made between the DICE method and the self-sustained discretization method of the C4.5 algorithm, then corresponding analysis and explain are presented on the experimental results.
Keywords/Search Tags:Data Preprocessing, Information Entropy, Rough Sets, Discrtization.
PDF Full Text Request
Related items