Font Size: a A A

Studies On Classifiers Based On Decision Boundaries From The Perspective Of Dividing Data Space

Posted on:2012-09-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y YanFull Text:PDF
GTID:1118330371958962Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Classifier is an important technique of machine learning. In classifier studies, there are two perspectives, the mapping perspective and the dividing perspective respectively. From the mapping perspective, a classifier model can be regarded as a mapping from the data space to the label set, and the training process of a classifier can be regarded as searching the hypotheses space for the appropriate one. From the dividing respective, a classifier model can be regarded as a group of decision boundaries dividing the data space into several decision regions, and the training process of a classifier can be regarded as dividing the data space to obtain decision boundaries. The mapping respective is the mainstream, and there are many studies from this perspective. There are no systematic studies on classifiers from the dividing perspective. This dissertation adopts decision boundaries to study classifiers from the dividing perspective. This dissertation will construct the theoretical framework based on decision boundaries from the dividing perspective and improve classifiers based on the new theoretical framework.Studies of this dissertation are as follows.1) This dissertation presents formal definitions of decision boundary, decision region and probability gradient region. This dissertation proposes two methods for obtaining decision boundaries, the formal method and the sampling method. This dissertation proposes Decision Boundary Point Set (DBPS) algorithm, Decision Boundary Point Set using Grid for 2-D data (DBPSG-2D) algorithm and Decision Boundary Neuron Set (DBNS) algorithm to obtain sampling points near decision boundaries. This dissertation proposes Self-Organizing Mapping based Decision Boundary Visualization (SOMDBV) algorithm and Self-Organizing Mapping based Probability Gradient Regions Visualization (SOMPGRV) algorithm to visualize decision boundaries and probability gradient regions.2) This dissertation proposes a theoretical framework based on decision boundaries from the perspective of dividing the data space. In this new theoretical framework, the dividing objective, the decision boundary form and the dividing methode are three elements of a classifier. The dividing objective needs to consider three factors, the training accuracy, the characteristic of misclassified instances and the micro-location of decision boundaries. The decision boundary form needs to consider three factors, the dividing capability, the domain knowledge provided and the comprehensibility. The dividing method needs to consider three factors, the information utilized, the dividing pattern and the complexity of the classification model.3) This dissertation proposes a new characteristic of misclassified instances based on the K nearest Neighbors (KN) type. According to the label relationship between an instance and its K nearest neighbors, instances can be divided into three KN types, S-type, DS-type and D-type. The characteristic of misclassified instances of K Nearest Neighbors (KNN) algorithm on the KN type is different from those of other three classifiers, C4.5 algorithm, Naive Bayes Classifier and support vector machine (SVM). This dissertation proposes K nearest Neighbors Combining (KNC) algorithm to combine KNN algorithm and C4.5 algorithm/Naive Bayes Classifier/ SVM. KNC algorithm uses KNN algorithm to make predictions for instances belonging to S-and DS-type, and uses other three classifiers to make predictions for D-type instances.4) This dissertation studies impact of discretization algorithms on classifiers' decision boundaries. This dissertation proposes that the reason why discretization algorithms can improve the generalization ability of Naive Bayes classifier is that discretization algorithms can improve the Vapnik-Chervonenkis (VC) dimension of Naive Bayes classifier. This dissertation applies discretization algorithms to SVM and KNN algorithm, and discusses impact on VC dimensions of above two classifiers.5) This dissertation proposes the second division (SD) algorithm to train classifiers in decision regions of Naive Bayes classifier and studies existing algorithms of training local classifiers. The SD algorithm is a new hybrid of global learning and local learning. Thus it can improve the generalization ability of Naive Bayes classifier. Existing algorithms of training local classifiers are divided into three types, test slection, divide-and-conquer and training selection. This dissertation proposes that the reason why local classifier training algorithms can improve generalization ability of classifiers is that they can improve VC dimensions of classifiers and they can utilize more information of training data set.
Keywords/Search Tags:Machine learning, classifier, decision boundary, visualization, elements of classifier, combining classifier, local classifier, Vapnik-Chervonenkis dimension, C4.5 algorithm, Naive Bayes classifier, support vector machine, k nearest neighbors algorithm
PDF Full Text Request
Related items