Font Size: a A A

Application Of Rough Set Theory In Data Reduction

Posted on:2008-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:J J GuFull Text:PDF
GTID:2178360215471646Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data Mining deals with big or huge database or data warehouses. As the database grows larger and larger, DM is facing new challenges. These days, the technology of DM has had great progress and becomes well. But they have less and less infection on the efficiency of DM.Data pre-processing as an important course of DM plays a key part of DM. It composes data cleaning, data integration and commutation, data reduction and so on. After this course the data will be changed into the form that we need. Some mature skills are approached. But in order to deal with the growing great capacity data and complex data structure many works should be done.Rough Set theory is a maths tool which used to deal with fuzzy uncertainty knowledge. It's a technique for efficiency soft computing. The main idea is on the basis of keeping the ability of sorting, using reduction, to get the decision-making rules. We can get the data reduction easily by Discernibility Matrix. Some people puts forward some upswing methods based on Discernibility Matrix and combining to other subjects to deal with more complex data.Attribute reduction is an important part of data processing. It is proved that to get all reduction of the attribute is NP-hard. So we start with improving the efficiency of reduction. In this thesis I present Rough Sets and data pre-processing theory. Then the basic algorithms of Rough Sets and a upswing method are approached. After that some other algorithms are told: such as Jelonek's: A algorithms based on attribute's importance; Hu's: a algorithms based on frequency function. Algorithms combined to descendiblity matrix. A cupidity strategy. All above arithmetic can prove the efficiency of reduction while used properly. And they are better than the basic algorithms based on discernbility matrix.At last, a parallel Rough Set theory reduction algorithms based on attribute importance is presented. Learning from cupidity strategy, distribute the task to different machines, and get together the results from them to get the final reduction. It is proved to be efficiency from theoretic and from the simulate experimentation.
Keywords/Search Tags:Data Mining, Rough Sets, Reduction, Data Pre-processing, Discernibility Matrix
PDF Full Text Request
Related items