Font Size: a A A

Research On Classification Algorithms For Uncertain Data

Posted on:2016-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:W J LiFull Text:PDF
GTID:2308330464962577Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data uncertainty exists widely in a variety of field, such as Internet, elecommunicaton,financial andinformation security. Traditional mining methods are effective for precise data without considering the uncertain information, so they can not be directly used for dealing with uncertain data. Uncertainty is the objective attribute of data that can not be ignored, directly affect the quality of data mining results. Mining on uncertain data has become a hot research topic, and brings new challenge to data mining.This thesis focused on data classification problems of interval uncertain data. Considering the fact that each methods have their own advantages and disadvantages in different problems, three common classifers, SVM, Naive Bayes and Decision tree, are studied in this thesis, mainly including how to establish applied data model, how to design improvement of classifier models. The main contributions of this thesis are summarized as follows.(1) A SVM based classification method on interval uncertain data is proposed.Firstly, uncertainty information of interval data is described by ellipsoidal convex model. Then, IUSVM and IUHSVM, two novel classification methods for uncertain data are proposed by introducing the uncertain data model into SVM and HSVM. Secondly, the methods for solving the uncertainty constrained programming problems are established. The optimization hyper-sphere of classification problem can be obtained by alternating iterative optimization on two sub optimization problems—upper and lower, and the approximate optimal solution of lower-sub optimization can be directly derived by using Taylor expansion. Experimental results show that this proposed method can achieve competitive classification accuracy and has strong robustness for noise.(2) A Naive Bayes based classification approach on interval uncertain data is proposed.Firstly, uncertainty information of interval data is described by an applied stochastic model. Then, IU-HNBC and IU-PNBC, two algorithms based on the Naive Bayes are proposed.a) IU-HNBC: Based on the idea of histogram estimation, a novel probability density functions(PDF) estimation model was established for uncertain data,and then was used to estimate the class-conditional probability density functions(CCPDF) of uncertainty Naive Bayes classifier. Experimental results on UCI datasets show that IU-HNBC has good classification accuracy, meanwhile,the runtime and memory requirements of IU-HNBC are also lower than existing methods.b) IU-PNBC: Firstly, class-conditional probability density functions(CCPDF) is estimated by using Parzen Windows. Secondly, an approximate function for CCPDF was obtained by using algebraic interpolation. Finally, the posterior probability was computed and used for classification by using the approximate interpolation function. Experimental results show that this proposed method can avoid the dependence on the training samples and improve the computation efficiency effectively and has a lower computation complexity and storage requirement thanexisting methods.(3) A decision tree basedclassification method on interval uncertain data is proposed.Aiming at the weakness of Decision tree with weak information expression ability on continuous attribute, interval uncertainty fuzzy decision tree(IU-FDT), a novel classification method for interval uncertain data was proposed.Firstly, the fuzzy data model was establised for interval uncertain data. Based on the random theory, the distance measure function between inteval numbers was defined by assumpting that the uncertain data satisfies uniform distribution on its interval range. Secondly, the interval uncertainty of data was tranformed to fuzzy uncertainty by using fuzzy cluster method for each dimension of the samples. Finally, fuzzy decision tree classifier was established for the fuzzy data. Experimental results on UCI datasets show that IU-FDT has strong robustness for noise, lower running time than SVM, and can achieve more stable classification accuracy than Naive bayes.
Keywords/Search Tags:Interval uncertain data, Data classification, Support vector machine, Naive bayes, Fuzzy decision tree
PDF Full Text Request
Related items