Font Size: a A A

Research On Generalized Rough Set Model And Attribute Reduction Algorithm

Posted on:2006-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z P LiFull Text:PDF
GTID:2168360155974268Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the rapid development of database techniques and computer network, large amount of data are stored, the rapid growth demand for extracting, understanding and assimilating useful knowledge from the growing mountains of data outpaces the traditional methods of data analysis, which leads to the emerging of knowledge discovery in databases and data mining. Rough set theory is a new mathematic tool and it has no need of other existing information, which makes it overcome shortcoming of other methods and avoid the influence of subjective factor to the results of data mining. It becomes one of primary methods of KDD.First, we introduce the rough set theory and its model. Theclassical model has limitation in dealing with inconsistent information. The classification by rough set must be entirely right or positive. Therefore its classification is accurate, namely only consider totally "including" and not "including", and have not a certain extent "include" and "belong to", but the noise data are unavoidable in practical application. Another limitation of the model is the target that it dealt with is already known, and the conclusion got from the model is only suitable for these targets , but in practical application, often need to apply the conclusion got from the small-scale targets to the extensive targets. The limitation of the classical model limits the application of it. Consulting VPRS (Variable precision rough set model) and changing the definition of precision with real classification accuracy, we propose one generalized model with real classification accuracy.This paper has deep research in the algorithms of attribute reducts and summarizes some present main algorithms. But up till now, though there are some achievements in the attribute reduct algorithm, there has not a recognized and high-efficient algorithm. The heuristic attribute reduct algorithm based on attributefrequency and another heuristic algorithm based on attribute reliability are two main attribute reduct algorithm based on the importance of attribute. The algorithm based on attribute frequency is a non-abundant algorithm, so it can't guarantee to get one result finally, but the algorithm based on attribute reliability give the guarantee. In calculation of the attribute importance, the calculation amount based on attribute frequency should be less than the calculation based on attribute reliability degree, so combine two pluses and minuses of algorithm, this paper proposes one improved algorithm of the algorithm based on attribute reliability degree. This algorithm guarantees to get reducts of decision table and save time compare with the original algorithm. Because it can't be unavoidable to contain the noise data in the decision table in reality, we use the rough set model based on the classification reliability in the improved algorithm. Then it makes this algorithm deal with the decision table with certain noise and having very good cover ability and generalizable ability.We compare the two algorithms from results of reduct and time of reduct by experiments. From the angel of results, the improvedalgorithm finds a reduct finally and it is an improvement of algorithm based on attribute frequency. From the angel of time, time is less than the algorithm based on attribute reliability. Finally, we deal with data with certain noise data, and have found reducts, the experiment is proved, and the improved algorithm has certain fault-tolerant ability.
Keywords/Search Tags:data mining, rough set, generalized rough set model, attribute reduction
PDF Full Text Request
Related items