Font Size: a A A

Research On The Bag-Level Covering Algorithm For Multi-Instance Learning And Its Applications

Posted on:2015-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:C RuiFull Text:PDF
GTID:2268330428464790Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
There are many practical learning problems for which the given examples to be classified are ambiguously labeled. In this thesis, we will examine one framework of learning from ambiguous patterns in machine learning that known as multi-instance learning. In multi-instance learning, each example is a bag, consisting of any number of instances. A bag is labeled negative if all instances in it are negative. A bag is labeled positive if at least one instance in it is positive. Because the instances themselves are not labeled, thus, each positive bag is an ambiguous object. An extensive number of noises in the positive bags is the inherent difficulty of multi-instance learning.In order to exclude the extensive number of noises in the positive bags of multi-instance data and improve the classification accuracy, this dissertation puts forward a novel bag-level multi-instance Covering kNN algorithm, i.e., MICkNN. The learning results of Covering algorithm is a set of sphere neighbors and each sphere neighbor only contains patterns belong to the same class. This feature help us reorganize the structure of bags in multi-instance data set. Generally speaking, in order to exclude false positive instances in the positive bags, first, we reconstruct the structure of multi-instance data set by treating the sphere neighbors obtained using Covering algorithm as the new structure of bags. Thus, improving the separable of multi-instance samples in the new feature space. Then, the bag-level kNN algorithm is utilized to exclude the noises in positive bags and predict the labels of test bags. The experiments on the drug activity prediction data sets and the content based image retrieval data sets demonstrate the effectiveness of the proposed MICkNN algorithm.The main work contents in the dissertation are as follows:1. Give an introduction of multi-instance learning, point out the main differences between multi-instance learning and standard machine learning. The main application areas of multi-instance learning are listed and the development process and research status are combed.2. Give the detailed definition of the multi-instance learning problems and describe the main ideas of several classical multi-instance learning algorithms. Dividing the existed multi-instance algorithms into two categories, the bag-level methods and the instance-level methods. In addition, noting the inherent difficulty of multi-instance problem and the shortage of the existing multi-instance algorithms.3. Put forward the MICkNN algorithm. Analyzed the feasibility of using Covering algorithm to reorganize the original structure of the multi-instance data sets. Point out that the Covering algorithm can help the bag-level kNN algorithm exclude a large number of false positive instances in the positive bags.4. Applied the proposed MICkNN algorithm to the drug activity prediction problem. Give an introduction to generate the drug molecule bag. The experiments on the real world and artificial benchmark Musk data sets demonstrate the accuracy and efficiency of MICkNN algorithm. And compare the proposed method to the existing start-of-the-art multi-instance algorithms.5. Applied the proposed MICkNN algorithm to the content-based image retrieval problem. Give a introduction to generate the image bag. The experiments on the COREL data sets, i.e., Fox, Tiger and Elephant, demonstrate the accuracy and efficiency of MICkNN algorithm. And compare the proposed method to the existing start-of-the-art multi-instance algorithms.
Keywords/Search Tags:Multi-Instance Learning, Constructive Covering Algorithm, DrugActivity Prediction, Classification, Content Based Image Retrieval (CBIR)
PDF Full Text Request
Related items