Font Size: a A A

Research On Naive Bayes Classifiers And Its Improved Algorithms

Posted on:2010-08-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:L X JiangFull Text:PDF
GTID:1118360275476888Subject:Earth Exploration and Information Technology
Abstract/Summary:PDF Full Text Request
Classification is one of very important tasks in data mining, and is widely used in real-world applications. For example, it can be used to judge whether a mail is a junk mail or not according to its tittle and content. There exist a lot of methods for building classifiers, such as Bayesian networks, decision trees, instance-based learning algorithms, artificial neural networks, support vector machines, genetic algorithms, rough sets, fuzzy sets, and so no. Amoung these mthods, Bayesian networks is the most popular one thanks to its special form for presenting uncertain knowledge, abundant ability for presenting probability, and incremental learning characteristic for synthesizing prior knowledge.Because learning an optimal Bayes classifier just like learning a Bayesian network and is an NP-hard problem, learning naive Bayes classifiers has attracted much attention from researchers. The naive Bayes classifiers is based on a simple but unrealistic assumption that the attribute values are conditionally independent given the class label. Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among attributes, called naive Bayes, is competitive with state-of-the-art classifiers such as C4.5 and the k-nearest neighbor algorithm.A natural question is: whether relaxing the attribute conditional independence assumption of the naive Bayes classifiers can scale up its classification performance or not. In order to answer this question, researchers presented a lot of improved methods, which can be broadly divided into three main categories: 1) Structure augmentation, this kind of approach uses directed arcs to represent the dependencies among attributes. 2) Attribute selection, this kind of approach selects an attribute subset from the whole space of attributes. 3) Local learning, this kind of approach builds a local naive Bayes classifier for a test instance.This thesis takes naive Bayes classifiers as the basic object, and researchs all kinds of improved methods of naive Bayes classifiers. Besides, three classifers, hidden augmented naive Bayes classifier, evolutional selective naive Bayes classifier, and dynamic local naive Bayes classifier, are presented in this thesis. In many real-world data mining applications, a ranking of test instances also is very important. Thus, this thesis surveys the ranking performance of the naive Bayes classifiers and presents a locally cloned naive Bayes ranking algorithm. Besides, this thesis studies some other methods for improving naive Bayes classifiers: attribute weighting, instance weighting, combining learning, and presents an instance weighted naive Bayes classification algorithm based on similarity and a combined classification algorithm based on C4.5 and NB. At last, this thesis explores the application values of the new algorithms in severeal real-world problems.The main contributions of this thesis include:1) This thesis gives the algorithm framework for learning augmented naive Bayes classifiers, reviews all kinds of structure augmentation methods for improving naive Bayes classifiers, and presents a hidden augmented naive Bayes classifier (simply HANB). HANB creates a hidden parent for each attribute node, which combines the influences from all the other attribute nodes by weightily averaging the conditional mutual information between each pair of attributes.2) This thesis gives the algorithm framework for learning selective naive Bayes classifiers, reviews all kinds of attribute selection methods for improving naive Bayes classifiers, and presents an evolutional selective naive Bayes classifier (simply ESNB). ESNB firstly uses an evolutional algorithm to select an attribute subset from the whole space of attributes, and then builds a naive Bayes classifier on the selected attributes. In ENB, the classification accuracy of naive Bayes classifiers is choosed as the fitness function, and the binary encoding method is used. In each binary string, the length is the number of original attributes, the bit of "0" or "1" respevtively means the status of an attribute being selected or not, and the condition for stopping selection is the maximum number of generations.3) This thesis gives the algorithm framework for learning local naive Bayes classifiers, reviews all kinds of local learning methods for improving naive Bayes classifiers, and presents a dynamic local naive Bayes classifier (simply DLNB). DLNB uses a brute-force search based on leave-one-out cross-validation to dynamically select a best k value for fitting the training data is learned at training time. Once the best k value is learned, it can be used to classify all test instances.4) This thesis reviews the reasearch status on ranking, surveys the ranking performance of the naive Bayes classifiers and presents a locally cloned naive Bayes ranking algorithm (simply LCNB). LCNB firstly applies the k-nearest neighbor algorithm to find k nearest neighbors of a test instance, and then clones each nearest neighbor according to the similarity between it and the test instance. At last, LCNB builds a naive Bayes classifier on the training instance set in which some clones have already been added.5) This thesis gives the algorithm framework for learning attribute weighted and instance weighted naive Bayes classifiers, reviews four kinds of methods of building combined classifiers, and presents an instance weighted naive Bayes classification algorithm based on similarity and a combined classification algorithm based on C4.5 and NB.6) This thesis explores the application valus of the new algorithms (HANB,ESNB,DLNB) in severeal real-world problems.
Keywords/Search Tags:naive Bayes, classification, ranking, class probability estimation
PDF Full Text Request
Related items