Font Size: a A A

Cost-Sensitive Learning Method Research Based On Three-Way Decision

Posted on:2017-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiuFull Text:PDF
GTID:2348330488467362Subject:Engineering
Abstract/Summary:PDF Full Text Request
An essential problem of data mining is classification problem,and the object of traditional data mining method is to obtain a classifier having higher classification accuracy.However,the uncertain elements in external environment will lead to that decisions can not achieve expected goals successfully.The inconsistent between decision result and expected result will cause decision risk cost,and the misclassification costs in classification model usually are not same,namely classification model has cost sensitivity.It is difficult to absolutely avoid wrong decision.The expectation of people is minimizing the decision risk cost rather than maximizing the interest.Pursuiting the consisitence between decision knowledge and empirical data will provide wrong information for people,and ignore the mind that pepeole like to reduce expected risk,can not solve the practical problem.The three-way decision regards decision problem as a classification problem,and this dealing model is consistent with the problem model of data mining.Therefore,three-way decision establishes a bridge between decision theory and data mining methods.The error tolerability and cost sensitivity in three-way decision can make traditional data mining methods have discrimination and sensitivity with respect to different misclassification result in classification,and make the decision result having minimum risk cost.Aiming at the problems that traditional data mining methods only pursuit the accuracy of classifier and can not solve practical problems.This paper introduces the three-way decision into traditional data mining methods,constructs cost-sensitive learning algorithms based on three-way decision,which makes the traditional data mining methods are more suitable for practial problems with cost-sensitivity.The main contents of this paper are as follows:(1)Aiming at the problems that the typical incremental learning algorithm for support vector machine(SVM)loses a lot of useful data which is not support vector,and the objectivity that existing incremental learning algorithms for SVM improve classification accuracy as much as possible.The cost-sensitivity and boundary region of three-way decisions were introduced to incremental learning algorithms for SVM.A new incremental learning method for SVM was proposed.Firstly the conditional probability of three-way decisions was measured based on learning method of SVM.Secondly the objects of boundary region of three-way decisions were partitioned and trained with the original support vectors and the newly added samples.Finally,the results of simulation experiments show that the proposed method not only can select useful information to improve the classification accuracy,but also make SVM is more suitable for practial problems with cost-sensitivity,as well as make the computation problem of conditional probability of three-way decisions is more applicable for specific learning enviroment.(2)Aiming at the problems that most existing top-n outlier detection methods involving k nearest neighbors(kNN)depend on parameters k and n,and it is difficult for users to specify proper parameters values.A new outlier detection method based on three-way semantics of three-way decision was proposed.Firstly the conditional probability in three-way decision was measured based on k nearest neighbor.Secondly,an optimization problem aiming at minimizing the decision cost was constructed to adaptively search the optimal conditional probabilities.And then,an algorithm was proposed to recursively partition outliers.Finally,experiments on several data sets show that the proposed method not only makes the measure method of conditional probability in three-way decision is suitable for outlier detection problem,but also can automatically detect outliers without users' participation.(3)Aiming at the problems that traditional data mining methods ignore inconsistent data,and general decision tree learning algorithms lack of theoretical support for the classification of inconsistent nodes.The cost sensitivity and boundary region of three-way decision were introduced to decision tree learning algorithms,and the decision tree learning method based on three-way decisions was proposed.Firstly,the proportion of positive objects in node was used to compute the conditional probability of the three-way decision of node.Secondly,the nodes in decision tree were partitioned to generate the three-way decision tree.The merger and pruning rules of the three-way decision tree were derived to convert the three-way decision tree into two-way decision tree by considering the information around nodes.Finally,an example was implemented.The results show that the proposed method reserves inconsistent information,not only generates decision tree with cost-sensitivity,but also makes the partition of inconsistent nodes more explicable.Besides,this method makes the measure mthod of conditional probability in three-way decision is suitable for decision tree learning problem.
Keywords/Search Tags:Three-way decision, Data mining, Cost-sensitive learning, Minimizing decision cost
PDF Full Text Request
Related items