Font Size: a A A

Research On Test-Cost Sensitive Bayesian Classifiers

Posted on:2018-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:G G KongFull Text:PDF
GTID:2348330533970056Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the popular topics of data mining and machine learning,classification has attracted many researchers' attention,and has been widely applied on different fields,such as customer outflow forecast,intrusion detection,medical diagnosis,text categorization,and so on.In most traditional classification researches,we always assume that data are stored in databases or available without charge,and our objective is to build a classifier with maximal classification accuracy.There are a lot of common used classification methods,such as Bayesian networks,decision trees,artificial neural networks,support vector machines.However the above assumption is always unrealistic and we must pay for each attribute value with different costs(e.g.money,time,risk),simply called test cost.In order to convert the traditional algorithms to the actual systems,besides focusing on how to maximize the classifiers' classification accuracy,researchers should consider how to minimize the test cost of classifier.We need to hold two objectives that one is the classification accuracy and another is the test cost in test-cost sensitive learning,which is a typical multi-objective optimization problem.For solving the multi-objective optimization problem,we can adopt multi-objective multi-objects optimization algorithms directly to get feasible solution set or integrate this multi-objective optimization problem into the single-objective optimization problem.Further,the latter solving strategy can be divided into two methods: 1)For transforming the multi-objective optimization problem into the single-objective constrained optimization problem,we view the classification accuracy as constraint condition and view the test cost as objective function in test-cost sensitive learning.2)For integrating learning objectives of multi-objective problem into a new objective function,we combine the classification accuracy and the test cost as a new objective function in test-cost sensitive learning in order to be taken in optimization search strategy.With the explosive development of data in the information society,the dimensionality of data also has got the exponential growth in last decades.Too many features will not only increase the storage consumption space and time complexity of the algorithm,but also reduce the final classification accuracy because of the unrelated/redundant features.Feature selection methods choose an optimal feature subset from the original feature set for algorithms,which becomes one of the main methods of improved na?ve Bayes and the existing abundant researches show that feature selection methods can significantly improve the performance of na?ve Bayes in terms of the classification accuracy.However,the existing researches rarely combine feature selection of na?ve Bayes and test-cost sensitive learning in order to research test-cost sensitive na?ve Bayes technically.This thesis takes na?ve Bayes classifiers as the basic object and presents two test-cost sensitive na?ve Bayes methods,named Constrained Optimization-based Test-Cost Sensitive Na?ve Bayes and Optimized Objective-based Test-Cost Sensitive Na?ve Bayes respectively.Experimental results based on WEKA platform proved the effectiveness of two new algorithms on reducing the test cost and keeping the high classification accuracy at the same time.At last,thesis explores the application values of the new algorithms in several different medical diagnosis problems.The main innovation and contribution of this thesis paper include:1)This thesis proposes constrained optimization-based test-cost sensitive na?ve Bayes(simply COTCSNB)algorithm.It always selects the feature which can improve the classifier accuracy most in traditional greedy search feature selection,aiming at a final classifier with the maximal classification accuracy.However in test-cost sensitive learning,COTCSNB algorithm views no classification accuracy reduction with feature removed as constrained condition,and deletes one feature with highest test cost from features set satisfying the constrained condition repeatedly,repeating above operation until the constrained condition can't be satisfied.2)This thesis gives the framework of test-cost sensitive feature selection wrapper method,and proposes a new objective function of test-cost sensitive learning,through the difference between the classification accuracy metric and the test cost metric.Finally,we propose optimized objective-based test-cost sensitive na?ve Bayes(OOTCSNB)algorithm based the new objective and the best first search strategy.3)This thesis analyzes the test cost problem of pathological value in medical diagnosis,explores the application of the new algorithms(COTCSNB?OOCTSNB)in medical diagnosis,such as heart disease,hepatitis,diabetes and thyroid disease.The experimental results show that new algorithms can maintain the high classification accuracy and reduce the test cost in medical diagnosis significantly.
Keywords/Search Tags:test cost, classification accuracy, na?ve Bayes, feature selection, medical diagnosis
PDF Full Text Request
Related items