Font Size: a A A

A Research On The Multi-Instance Learning Algorithm Via Disambiguation Based On Support Vector Data Description

Posted on:2013-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:M QiFull Text:PDF
GTID:2248330395962368Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of machine learning, it has been adopted in many traditional research areas. The methods of machine learning application continues growing, especially the data analysis methods based on machine learning have become one of the key technologies for solving complex problems. Therefore, the role of machine leaning has gradually changed. With a number of new methods and ideas (such as multi-instance learning) proposed, it has developed into a new stage, and made the transition from theoretical analysis to practical application. Multi-instance learning has become a new theory in machine learning. In the multi-instance learning, training samples are bags, which are composed of a number of instances. The bag is concept marked, but the instances itself is not. As multi-instance learning has unique characters and broad application prospects, it is considered as a new study framework apposed with supervised learning, non-supervised learning, and reinforcement learning, and it has attracted the extensive attention of many researchers.Based on the support vector data description and multiple-instance learning algorithm research, this paper proposes two multi-instance learning algorithms via disambiguation based on support vector data description (SVDD):MIL-NSVDDI and MIL-NSVDD_B algorithm. The main work of this paper can be divided into three parts.Firstly, this paper researches on the support vector data description proposed on the basis of the SVM and minimum bounding sphere theory which including hard margin, soft margin and negative class data sample method. And we study the influence of kernel parameter and penalty parameter on SVDD.Secondly, in order to identify the true instances in bags, this paper proposes a disambiguation method to convert the sample data set to a single sample data set. We sort the instances’ prediction accuracy values in descending order and select m+instances which are the m+largest prediction accuracy values. The parameter m+is the minimum value to pick one instance from a bag. These chosen examples would be the elimination of ambiguity for positive sample set. For the negative package, we calculate the distances among its instances and those in positive set, then sort them in ascending order, after this we choose the smallest m-instances from them. The parameter m is the minimum value to pick one instance form a negative package. Those selected instances would be the representative negative package.Finally, this paper adopts two feature representation schemes, one for instance-level classification and the other for bag-level classification, to convert the multiple instance learning problem into a standard machine learning problem that can be solved by the negative SVDD method. The experiment results verify the effectiveness of this algorithm, and the algorithm classification accuracy is analyzed.To sum up, the methods MIL-NSVDD_I and MIL-NSVDD_B proposed in this paper are the algorithms which are able to effectively solve the multi-instance learning problems, and have strong theoretical and applied significance.
Keywords/Search Tags:Machine learning, disambiguation, multi-instance learning, support vectormachines, support vector data description
PDF Full Text Request
Related items