Font Size: a A A

Single And Multiple Instance Lerning Based On Support Vector Data Description

Posted on:2012-06-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L FangFull Text:PDF
GTID:1228330374996470Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Support Vector Data Description (SVDD), which is inspired by minimum enclosing ball and support vector machine theories, is capable to achieve better classification results when fewer samples are available. Compared to its peer methods, SVDD features high computation speed, strong robustness and good classification ability. SVDD has applied successfully to some fields, such as speech character recognition, image processing, intrusion detection et al.Usually, objects in the real world do not have only unique semantic, but may be ambiguity. Therefore, it has become one of very important research topics to be able to learn from ambiguity objects. To solve the problems caused by objects ambiguity, the first step is to assign a suitable label subset, not unique label to given objects. In fact, majority of classification problems in the real world can be treated as multiple instance learning or multi-label learning problems, that is to say, it will be more rational to treat these problems as multiple instance learning or multi-label learning ones.In this thesis, through the research for support vector data description, especially its variants proposed in recent years, two classification methods based on SVDD are discussed. Similarly, through the research for multiple instance learning, ten types of SVDD based multiple instance learning methods from four different kinds are given. The main research contents are as follows.A SVDD method based on maximum distance between two centers of spheres is presented. This method uses different hyper spheres to describe data from different classes respectively. Hyper spheres are utilized to classify training samples into different margins. In the meantime, the method introduces maximizing the distance between two centers of spheres as objective function. Since different hyper spheres are adapted to describe the data from different classes, this method is fit for unbalance classification problems quite well. Experimental results show that as to unbalance problem, SVDD based on maximum distance between two centers of spheres method can get better classification results than other methods.Two order annular margin SVDD methods are presented through using two order cost function. This method uses two homocentric hyper spheres to separate two-class training samples and maximizes the margin between two classes. Additionally, in considering boundary bias problem caused by unbalance data distribution, the thesis discusses two improved models. Objective function in first improved model employs different cost parameters for positive class and negative class respectively. Constraint condition uses asymmetric strategy, that is, for classes with large numbers, their samples should be inside hyper sphere. For classes with few numbers, their samples should be not only outside hyper sphere, but also far away from hyper sphere. The second improved model adopts no-wrong-label strategy for samples from classes with few numbers. Experimental results imply that two order annular margin SVDD method is capable to obtain high precision whether balanced or unbalanced data distribution problems.SVDD is introduced into multiple instance learning. Ten multiple instance learning methods from four different kinds are put forward, i.e. SVDD multiple instance learning based on labeled instances (mi-SVDD), SVDD multiple instance learning based on labeled bags (MI-SVDD), four SVDD multiple instance learning algorithms based on instances mapping, i.e. SVDD-MILD_Ⅲ, SVDD-MILD_I12, SVDD-MILD_I21and SVDD-MILD_I22and four SVDD multiple instance learning algorithms based on bags mapping, i.e. SVDD-MILD-B11, SVDD-MILD_B12, SVDD-MILD_B21and SVDD-MILD_B22. mi-SVDD supposes that all instances in positive bag are positive and puts these instances and instances from negative bag together. Then utilize NSVDD with negative label to train the classifiers and iterate to correct examples label in positive bags. MI-SVDD selects most likely positive example from each positive bag as positive example. And it selects most non-negative example from each negative bag as negative example. These positive and negative examples selected are put together. Then using NSVDD with negative examples to train the classifier and iterates to correct most likely positive instance. The main ideas for SVDD-MILD Is and SVDD-MILD Bs are:First, certain methods are used to predict representative positive example in positive bag and representative negative example in negative bag. Then, the problems are mapped into feature spaces by utilizing two kinds of feature mappings, that is examples feature mapping and bags feature mapping. Therefore, multiple instance learning problems are transformed into traditional machine learning problems expressed by feature space. Finally, NSVDD with negative classes is applied to solve the problems.Experimental results in the dataset. MUSK indicate that precisions of mi-SVDD and MI-SVDD are comparative to those of mi-SVM and MI-SVM. And higher precisions are achieved by all four kinds of SVDD-MILD_I algorithms for both MUSK1and MUSK2, which are best performances obtained so far among all peer methods as we know. However, precisions for four kinds of SVDD_MILD_B algorithms are a little bit lower than those of SVDD_MILD_I algorithms, whose precisions are equivalent to average precisions of peer algorithms in the literature.Finally, four kinds of SVDD-MILD_I algorithms and four kinds of SVDD-MILD_B algorithms are applied into content-based image retrieval of Corel image collection. Experimental results demonstrate that four kinds of SVDD_MILD_I algorithms outperform all four kinds of SVDD_MILD_B algorithms, especially, SVDD_MILD_I21and SVDD_MILD_I22. For ten categories of images, average retrieval precisions obtained by SVDD_MILD_I21and SVDD_MILD_I22are better than best precisions achieved by peer algorithms as we know so far. In addition, these two algorithms could distinguish two easily misclassified categories, Beach and Mountains quite well, which shows that they can be applied to the field of content-based image retrieval.
Keywords/Search Tags:Machine learning, pattern recognition, support vector data description, multiple instance learning, image retrieval
PDF Full Text Request
Related items