Font Size: a A A

Studies On Uncertain Data And Cost Sensitive Learning

Posted on:2018-10-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:1318330515450486Subject:Agricultural Electrification and Automation
Abstract/Summary:PDF Full Text Request
The data to be processed via traditional classification algorithms are certain and accurate data,but uncertain data are ubiquitous in the real world due to the protection of privacy,imprecise measurement,repetition of sampling or missing values.The problem how to enable access to effective information from uncertain data has caused the concern of the industry.Cost sensitive learning has a very important practical significance.In cost sensitive learning,the costs are different for different errors,and The cost-sensitive classification algorithms are concerned about how to minimize the sum of the costs of these errors.Cost sensitive model is more reasonable in many fields.This paper focus on uncertain data and cost sensitive learning,mainly including how to build a classifier for uncertain data stream,how to learn better performance cost sensitive algorithms and how to build a cost sensitive classifier for uncertain data.The contributions of this paper included:(1)To build a classifier for uncertain data stream,an Ensemble of Uncertain Decision Tree Algorithm(EDTU)is proposed.Firstly,the decision tree algorithm for uncertain data(DTU)was improved by changing the calculation method of its information gain and improving the efficiency of the algorithm so that it can process the high-speed flow of data streams;then,based on this basic classifier,dynamic classifier ensemble algorithm was used,and the classifiers presenting effective classification were selected to constitute ensemble classifiers.Experimental results demonstrate that the proposed EDTU algorithm is efficient in classifying data stream with uncertain attribute,and the performance is stable under the different parameters.(2)A novel cost-sensitive classification algorithm,named CS-NBT(cost-sensitive Na?ve-Bayes tree),is proposed,in which the hybrid method is combined cost-sensitive decision tree and Na?ve-Bayes.In this paper,we define the expect cost information gain,which is the change in misclassification costs from variate splits;design a test strategy that can suggest ways of attribute selection in order to minimize the misclassification costs.After the cost-sensitive tree has inducted,we replace the leaves with cost-sensitive Na?ve-Bayesian classifiers,which make use of the information supplied by the attributes which are ignored by the tree.We empirically evaluated CS-NBT with CSTree,MetaCost and NBTree on UCI Datasets,find the performance of CS-NBT significantly outperform the others,and the performance is stable under the different parameters.(3)An aggregating N-dependence estimators(ANDE)based cost sensitive classification algorithm CS_ANDE was put forward.First,one dependence estimator was adopted to figure out approximate value of misclassification cost.Second,multiple classifiers were constructed with an aim to minimize the misclassification cost.Subsequently,these classifiers were utilized to re-label samples.Ultimately,a CS_AODE classifier could be obtained by learning these re-labeled samples by MetaCost.Likewise,in line with 2 dependence estimators,a CS_A2DE classifier was acquired.We empirically evaluated CS_AODE and CS_A2DE with MetaCost and AODE on UCI Datasets,find the performance of CS_AODE and CS_A2DE significantly outperform the others,and the performance is stable under the different parameters.(4)A decision tree based cost-sensitive classification algorithm(CS-DTU)for uncertain datasets was proposed.First,Based on the concept probabilistic cardinality,we define the selection method of splitting attribute on decision tree,and computer related cost of building tree.Then,Using the classification method of DTU to classy new instance.The experimental results on UCI Datasets demonstrate the proposed algorithm can effectively reduce total cost,and the performance is stable under the different parameters.(5)A na?ve bayes based cost-sensitive classification algorithm(CS-NBT)for uncertain datasets was proposed.First,we apply probability and statistics theory on uncertain data model.Second,we define the utility of uncertain attribute to total cost,and propose a new test strategy for selection of attribute.Then,based on CS-NBT,we propose a single batch test algorithm on Cost-sensitive Uncertain Na?ve Bayes for Uncertain Data(SBT-CSUNB).We define the influence of a uncertain attribute to the total cost in cost-sensitive Naive Bayes Classifier,apply greedy algorithm and design a method to find an optimal batch test strategy.The experimental results on UCI Datasets demonstrate the proposed algorithms can effectively reduce total cost and SBT-CSUNB has a better robustness.
Keywords/Search Tags:uncertain data, cost sensitive, data stream classification, na?ve bayes, decision tree, single batch
PDF Full Text Request
Related items