Font Size: a A A

Comparing Classifiers In Data Mining

Posted on:2017-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:S Y LuFull Text:PDF
GTID:2348330515481387Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the development of computer technology and database management system,how to excavate potential valuable information efficientlyfrom the voluminous data has become the focus of people.Especially the classification technology of data mining can be used to solve most of problems in business and science.There are a variety of classificationalgorithms.Some of the most commonly used algorithms are Naive Bayesian,Decision Tree and Support Vector Machine(SVM),Ensemble Learning.However,no one can be applied to solve all problems in practical application,because each one has its own characteristics.People are not satisfiedby using classification modeling technology to analyze data sets to provide better decision-making basis for policymakers.At the same time,they hope to be able to improve the efficiency to solve the problem of classification and create more value.In order to solve classification problems efficiently,finding the applicable circumstances of different classification algorithms and the advantages and disadvantages of the different application questions,selecting the best classification algorithm which is more quickly and directly to find the best corresponding classification effect based on the different application characteristics,achieving the function of automatic selection and improving the effect of solving classification problems has become an essential need.However,there are few domestic experts comparing the original algorithms.A foreign professor,Michieet,has made a comparison between three kinds of techniques of neural network,statistical classification and machine learning,and applied it to the practical industrial problems.And this article will compare three kinds of classification algorithm of Native Bayes,C5.0 decision tree,Support Vector Machine more respectively.After the above algorithm principle and classification comparison criteria are introduced,this article respectively choose four different representative experiment cases in the field of social,commercial,biological and economic,with different number of instances,number of missing values,number of attributes used for prediction,and the number of target categories.And establish classification model respectively based on the above three kind of typical classification algorithm,make a comparison and analysis in classification results'accuracy,classification algorithm stability,the interpretation of the results of the classification algorithm,classifier running speed and effect of processing data sets with missing values.At last conclude the advantages and disadvantages of the three algorithms applied to different data sets.Finally,the experimental results show that the support vector machine classification algorithm has obvious advantages in the dependence of historical data,the accuracy of classification results and the stability of the algorithm,compared with the other two kinds of classification algorithms.Decision tree algorithm has obvious advantages in terms of its operating speed and the results of its interpretation.Naive Bayes algorithm is better than the other two algorithms in dealing with missing data sets.Therefore,when in actual problem to obtain the relatively small volume of sample,using support vector machine algorithm has the best effect,and for massive data,decision tree algorithm is one of the most efficiency,and when the collected data set contains a large amount of missing values,Naive Bayes algorithm is better.
Keywords/Search Tags:Datamining, Classification, Naive Bayes, Decision tree, Supportvector machine
PDF Full Text Request
Related items