Font Size: a A A

Research On Method Of Attribute Discretization In Data Mining

Posted on:2011-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:C L ZhaoFull Text:PDF
GTID:2178360305455939Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the development of computer network and digital technology, the speed of growth and transmission of information become faster and faster. The amount of data increase exponentially. Data mining has become focus for researchers, which is used to extract useful and meaningful knowledge from the vast amounts of complex data. Because the data collected directly from the real world usually contain noisy data and continuous attribute value. Moreover, most tools of data mining only can deal with discrete attribute values. It is necessary and very important to apply the algorithm of discretization and data reduction to preprocess the datasets before data mining.Rough sets theory is a mathematical tool to describe incompleteness and uncertainty. Its inherent features make it suitable for rough set data analysis, information systems and knowledge extraction. In this paper, we analyze many existing methods of discretization, and apply rough set theory to optimize the method of data reduction. The main work on research contains two aspects as follows:(1) In this paper, we present a novel data reduction algorithm based on rough sets theory and discretization of continuous attributes, namely RS-D (Rough Sets-Discretization). This algorithm discrete continuous attributes with Rectified Chi2 algorithm firstly. Secondly, RS-D conducts attribute reduction and rule reduction on discretization results combining Rough Sets Theory. The experiments are performed respectively with the result of discretization by using C4.5 and SVM. Experimental results show that the presented algorithm is effective.(2) By analyzing degree of freedom in Chi2 related algorithms, we propose a novel discretization algorithm based on the improved degree of freedom.
Keywords/Search Tags:Discretization, Rough set, Chi2, C4.5, SVM
PDF Full Text Request
Related items