Comparing Classifiers In Data Mining

Posted on:2017-07-06

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Lu

Full Text:PDF

GTID:2348330515481387

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

With the development of computer technology and database management system,how to excavate potential valuable information efficientlyfrom the voluminous data has become the focus of people.Especially the classification technology of data mining can be used to solve most of problems in business and science.There are a variety of classificationalgorithms.Some of the most commonly used algorithms are Naive Bayesian,Decision Tree and Support Vector Machine(SVM),Ensemble Learning.However,no one can be applied to solve all problems in practical application,because each one has its own characteristics.People are not satisfiedby using classification modeling technology to analyze data sets to provide better decision-making basis for policymakers.At the same time,they hope to be able to improve the efficiency to solve the problem of classification and create more value.In order to solve classification problems efficiently,finding the applicable circumstances of different classification algorithms and the advantages and disadvantages of the different application questions,selecting the best classification algorithm which is more quickly and directly to find the best corresponding classification effect based on the different application characteristics,achieving the function of automatic selection and improving the effect of solving classification problems has become an essential need.However,there are few domestic experts comparing the original algorithms.A foreign professor,Michieet,has made a comparison between three kinds of techniques of neural network,statistical classification and machine learning,and applied it to the practical industrial problems.And this article will compare three kinds of classification algorithm of Native Bayes,C5.0 decision tree,Support Vector Machine more respectively.After the above algorithm principle and classification comparison criteria are introduced,this article respectively choose four different representative experiment cases in the field of social,commercial,biological and economic,with different number of instances,number of missing values,number of attributes used for prediction,and the number of target categories.And establish classification model respectively based on the above three kind of typical classification algorithm,make a comparison and analysis in classification results’accuracy,classification algorithm stability,the interpretation of the results of the classification algorithm,classifier running speed and effect of processing data sets with missing values.At last conclude the advantages and disadvantages of the three algorithms applied to different data sets.Finally,the experimental results show that the support vector machine classification algorithm has obvious advantages in the dependence of historical data,the accuracy of classification results and the stability of the algorithm,compared with the other two kinds of classification algorithms.Decision tree algorithm has obvious advantages in terms of its operating speed and the results of its interpretation.Naive Bayes algorithm is better than the other two algorithms in dealing with missing data sets.Therefore,when in actual problem to obtain the relatively small volume of sample,using support vector machine algorithm has the best effect,and for massive data,decision tree algorithm is one of the most efficiency,and when the collected data set contains a large amount of missing values,Naive Bayes algorithm is better.

Keywords/Search Tags:

Datamining, Classification, Naive Bayes, Decision tree, Supportvector machine

PDF Full Text Request

Related items

1	Research On Hybrid Classification Based On Navie Bayes And Decision Tree
2	Research On Text Classification Algorithm Based On Naive Bayes Method
3	Research On Network Traffic Classification Based On Machine Learning
4	Research On Text Classification Algorithms Based On Machine Learning
5	Classification Based On Influence Functions
6	Research On Feature Selection And Classification Based On Intelligent Optimization Algorithms
7	Research On Classification Algorithms For Uncertain Data
8	Study On Uighur And Kazakh Illegal Web Page Recognition Methord
9	Research On Personal Credit Evaluation Based On Decision Tree Integration Algorithm
10	Research On Hybrid Classification Algorithm Based On Decision Tree And Na?ve-bayes In Intrusion Detection