Font Size: a A A

A Comparative Analysis Of Classifying Algorithms In Data Mining Technology

Posted on:2008-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:M C ZhengFull Text:PDF
GTID:2167360215467588Subject:Statistics
Abstract/Summary:PDF Full Text Request
The problem of classification is a major subject of research in data mining technology. Classification is the technology for building a model according to the characteristics of the data set and assigning categories to samples of unknown type by means of the model. At present classification algorithm includes statistical classification, decision tree and nerve network and so on. Different classification methods will produce different classification models. The quality of the classification model has a direct effect on the efficiency and accuracy of data mining. Therefore, it is of vital importance to choose the most effective algorithm when classifying large quantities of data.So far studies of classification algorithm of data mining fall into several types: survey of classification algorithm, improvement on classification algorithm, combination of certain classification algorithms, experimental studies of classification algorithm under the condition of small samples, studies and application of a given single classification algorithm. At present, most researchers tend to put forward new algorithms but seldom conduct experimental analysis or comparison of algorithms. Especially lacking are Contrastive studies of all existing algorithms used in classifying a particular data set. In order to fill this gap, the present paper makes an in-depth study of the problem of classification in data mining through concrete examples, analyzing and comparing the characteristics of each algorithm. It is concluded that the neural network algorithm has a better overall effect. We also find that different types of data set, data sets of different domains, different classification patterns, different criteria of comparison and different classification methods will all produce different results. Therefore, different classification methods must be used with different data sets according to their own characteristics and classification patterns. Only in this way can we expect to reduce errors to the minimum and ensure high accuracy of classification results.
Keywords/Search Tags:Data mining, Classification, Logistic regression, Bayes, Decision tree, Nerve network
PDF Full Text Request
Related items