Font Size: a A A

The Applied Research On Attribute Reduction Algorithm Of Rough Set Based On Discernibility Matrix In Date Mining

Posted on:2008-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z L LiFull Text:PDF
GTID:2178360242958971Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the advent of information age,human being faces more and more data in all kinds of fields.In the meantime,all the data are increasingly growing at an amazing speed.In order to improve the efficiency of work and quality of life,people have to derive valuable knowledge embedded in data from databases.For the aim,people have begun the research on knowledge discovery in databases.As we all know,however,usually there are redundant data,missing data,uncertain data and inconsistent data in the databases and they become a great barrier to extracting knowledge from databases.So data preprocessing has to be done before our knowledge discovery in database.This thesis pays much attention to the research on data preprocessing,especially focuses on the aspects of attribute reduction.Rough set theory,initialized by Professor Z.awlak in early 1980's,has been proved bo be an excellent mathematical tool dealing with unceratain and vague description of objects.Basic idea of rough set theory is to derive classification rules of conception by knowledge reduction with the ability of classification unchanged.It may find the hiding and potential rules,that is knowledge,from the data without any preliminary or additional information.In recent years,as an important part of soft computing,rough set theory and its applications have played an important role,especially in the areas of pattern recognition,machine learning,decision analysis,knowledge discovery and knowledge acquisition etc.This thesis studies the attribute reduction algorithm of rough set based on discernibility matrix in data mining. First of all,data mining and rough set theories are elaborated. With the analysis and summary on this data mining algorithms based on rough set theory,HORAFA algorithm has been meticulously analyzed. HORAFA is based on the eliciting attribute reduction algorithm,in order to improve the maturity of the algorithm,the attribute reduction efficiency and reduce its running time. Therefore,this thesis proposes HORAFA-AFVDM (HORAFA base on Attribute frequency value of discernibility matrix) on the basis of discernibility matrix to ameliorate HORAFA.It takes the core as foundation, joins the most important attributes to the capacity,then calculates attribute frequency function,which equals the attributes occurring frequency after deleting the current attributes in discernibility matrix. It is indicated as follows: f(a)=f(a)+|A|/|c'|,for every a∈c,|A|is the quantity of all condition attributes, | c'|is the quantity of the left attributes after deleting the core-added attribute.In order to find out the optimum reduction of information system,the reversed deleting procedure is added on,to delete all the possibly deleted attributes from the core as much as possible,to ensure the maturity of the algorithm,and the whole process of the algorithm is exemplified in the thesis.With that, the algorithm is realized in a specific way.Considering the experiment condition is under the environment of MATLAB,the reduced data set is pretreated ahead of attribute reduction.The thesis introduces the pretreatment methods,which sets up the database for UCI dataset in the SQL SERVER 2000,and then limits the data value in the specified range by SQL sentence. At last,the experiment is completed under the environment of the MATLAB,and it proves validity of improved HORAFA-AFVDM with the comparison of attribute reduction and running time of the two algorithms.
Keywords/Search Tags:dataming, rough set, attribute reduction, discernibility matrix
PDF Full Text Request
Related items