Font Size: a A A

Study On Risk Minimizing Based Threshold Automatic Setting And Email Classification With Three-way Decision-Theoretic Rough Sets

Posted on:2017-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:P HuFull Text:PDF
GTID:2348330485999342Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Rough set theory is an effective mathematical tool, which can be used to analyze and deal with the fuzzy and uncertain data. It has been widely used in the fields of data mining and machine learning. But the classical rough set determining whether an object belongs to a class or not has a strict requirement that it strictly requires that all objects belonging to an equivalent class belong to the object. Therefore, the model is very sensitive to noise data, the generalization ability is not strong, and the fault tolerance is insufficient. To solve this problem, many people proposed extended models, decision-theoretic rough set model is one of them. Bayesian decision theory is introduced in this model. And the domain is divided into positive, negative and boundary regions by a pair of threshold (alpha, beta).Three-way Decision is to accept, do not accept and cannot make judgments in the process of dealing with the problem. Three-way Decision and decision-theoretic rough set model has a natural link, so how to choose the threshold, which can make the decision of the overall cost of the minimum, it has become an important issue. However, the threshold is often given by experts in the related fields, which impedes the three-way decision-theoretic rough set model being used in practice. For this purpose, an automatic algorithm for finding the optimal threshold using artificial fish swarm algorithm is proposed. The algorithm does not need experts in the field to set the threshold. It takes the decision risk minimization as the objective function. In three-way decision-theoretic rough set model, artificial fish is used to learn the appropriate threshold from the given data. Some data sets of UCI machine learning database are used to test the proposed algorithm. Experimental results show that the algorithm are better than the existing adaptive algorithm and simulated annealing algorithm in the running time and the threshold which is learned.In order to validate the algorithm in practical application, the algorithm is applied to e-mails classification. The algorithm is applied to the spambase dataset which is one of UCI machine learning database to learn threshold automatically. Using the three-way decision-theoretic rough set model, the e-mails are classified as normal e-mails, junk e-mails and suspicious e-mails which need to be further confirmed. Experimental results show that the algorithm can effectively improve the accuracy of the classification of the mail, and it can reduce the error rate.
Keywords/Search Tags:Three decision rough set, cost function, artificial fish swarm algorithm threshold, mail classification
PDF Full Text Request
Related items