Currently, there are many commonly used data mining methods, and this thesismainly studies the rough sets method in data mining, and focuses on the algorithmsbased on rough sets in the application of the rule extraction stage. Rough sets in datamining is usually used for knowledge reduction, to extract the rule. Attribute reductionis one of the core content of rough set theory. This thesis studies the attribute reductionalgorithm based on rough sets in-depth and puts forward the improvement method. Atthe same time, a new attribute reduction algorithm is proposed.Rough sets theory is a new mathematical tool that can deal with the fuzzy andimprecise problems, and it is a new data mining technology. The traditional attributereduction algorithms either have highly space complexity, or are not precise enough.The new attribute reduction algorithm proposed by the thesis is just a good solution tothe problem of space complexity. It is suitable for a large table or large files in attributereduction to get specific rules, which is a traditional one can not do.The main research contents are as follows:(1) Make an analysis of the current research situation of data mining based onrough sets. Deeply study the theoretical knowledges of rough sets and some relatedtechnologies of data mining; Combine rough sets and data mining, focus on the datamining model based on rough sets, and make a systematic analysis of the application ofrough sets in data mining.(2) Do deeply research on several traditional attribute reduction algorithms basedon rough sets and analyse their respective strengths and weaknesses.On this basis, animproved attribute reduction algorithm based on discernibility matrix is put forward,and its superiority is experimentally verified.(3) To solove the problem of traditional attribute reduction algorithms exposed inthe application, with the help of the tree structure, the thesis establishs the theory ofmulti-tree, and puts forward a new reduction algorithm based on it. Compared totraditional ones, it has a great advantage because it has lower space complexity and issuitable for data mining of large tables or files, to get the specific rules. (4) Select three data sets of different size from the UCI as training sets, and carryout a detailed simulation experiment on the two comparison algorithms, to verify thefeasibility and effectiveness of the new attribute reduction algorithm. |