Font Size: a A A

VPRS Based Approaches For Discretization Of Continuous Attributes And Data Preprocessing

Posted on:2007-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:X M KongFull Text:PDF
GTID:2178360182498935Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Data mining is one of the most active fields in artificial intelligence and databasesnowadays. It is regarded as the core of knowledge discovery in database and aimed fordiscovering hidden, unknown and potentially useful knowledge in data. In essence, datamining is to find rules and common patterns from big datasets.Discretization of continuous attributes, which maps continuous values into discretevalues, is a very important step in data preprocessing phase in the process of data mining. Itis extensively studied owing to its essential capability of reducing space-time complexity inthe following procedures and enhancing robustness of systems. Rough set theory, which is presented by Pawlak in 1982, is a powerful tool for reasoningabout data. It has been successfully used in machine learning, knowledge acquisition,decision analysis, knowledge discovery, pattern recognition, expert system, decision supportsystem and other fields. The characteristics of the rough set theory enable researchers toeffectively handle imprecise data without any apriority information except the data itself.This paper mainly studies the following problems:1) Discretization of continuous attributes based on variable precision rough set. Variableprecision rough set model is proposed by W.Ziarko by introducing error factor β, which canextend precise binary equivalence relations to common binary relations on the basis of therough set theory. A method of discretization of continuous attributes based on variableprecision rough set is presented in the paper. It is capable of classifying the cases that do notbelong to positive region into positive region, which can enhance generalization ability. Thecalculation process of this method is simple and easy to realize.2) A data preprocessing method based on multi continuous attribute discretization. Atpresent, many discretization algorithms including c4.5 have the characteristics of supervisedlearning, robustness and manipulating single attribute, which will easily lead to somesingular data being omitted as noisy data and incorrect data affecting the classification resultbecause it is not eliminated in time. This paper proposes a preprocessing method based onthe discretization of multi continuous attributes. By amending the result of discretization,the method is capable of dealing with noisy data, singular data and incorrect data separately.In this paper, we compare the new method with c4.5 through examples to explain estimatecriterion and process for dealing with the three cases of the data under consideration...
Keywords/Search Tags:data mining, discretization, variable precision rough set, Multi-attribute, preprocessing, robustness
PDF Full Text Request
Related items