Font Size: a A A

Research On Multi-Instance Learning Based On Covering Algorithm

Posted on:2016-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2308330461492500Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The traditional machine learning mainly includes supervised learning, unsupervised learning and reinforcement learning, etc. A sample corresponding to a certain tag in these learning framework. However, in many practical problems, the sample that to be classified is not clear. So a new learning framework is appeared, named Multi-Instance Learning (MIL). This learning method has guadually become a hot research topic in the field of machine learning. The framework is originated from the activity of the drug molecules prediction. The learning sample is called a bag which composed of multiple instance. The bag’s tag is known, but the instance’s tag in bag is unknown. A bag is labeled positive if at least one instance in it is positive. A bag is labeled negative if all its instances are negative. It cannot directly to process these instances because there are many false positive instances in positive bag. That is to say, the positive bag is an ambiguity object. So these reasons increasing the difficult of multi-instance learning.Due to the positive bag’s tag is decided by the true positive instance, the classification accuracy will be improved if we can select the true positive instance from positive bag and exclude the false positive instance from it. There are many similar algorithm. However, the selected instances of these methods do not enough to represent the training bag and these methods do not consider the importance of the selected instances. If an instance ai has many similar instances with the same label around it, the instance should be more representative than others. Because for a random test sample, the test sample has larger probability fall into the sample around of ai. And the test sample is more likely with the same label of ai. So we can consider that ai is more representative than others and we should select it first.This dissertation focuses on how to use the clustering feature of Constructive Covering Algorithm(CCA) find representative instances and these instances can effectively represent the bag. So in this dissertation, two algorithms which based on constructive covering algorithm via instance selection are proposed, named T-MilCa and M-MilCa. T-MilCa does not considered the selected instances’importance and M-MilCa employs the number of covered instance to measure the selected instances’ importance. The thought of these two methods is employ maximal Hausdorff to select some initial positive instances from positive bags, then use CCA to restructure the structure of the original instances of negative bags. Then an inverse testing process is employed to exclude the false positive instances from positive bags to update the initial positive instances and to select the representative instances from negative bags via CCA. Finally, a similarity measure function is used to convert the bag into a single instance and CCA is again used to learn and classification for the converted samples.The main work contents in the dissertation are as follows:1.Give an introduction of research background and research status of multi-instance learning. The application ares of multi-instance and the obstacles existing in this field are given in this part.2.Give the detailed related concept of multi-instance learning and its difference of with the traditional learning method. The main ideas of several classical multi-instance learning algorithms are described in this part. Points out the difficulties and the shortage of existing algorithms. In addition, briefly introduces the main ideas of CCA and its training and testing process, and analysis how to apply CCA into the study of MIL.3.Apply maximum Hausdorff distance and CCA into multi-instance learning and put forward the multi-instance learning based on instance selection algorithm, named T-MilCa. This method use the Euclidean distance to exclude the false positive instances and do not consider the selected instances’ importance. Experimental results on standard benchmark data sets and COREL image data sets demonstrate the effectiveness of T-MilCa.4.Put forward the other method based on instance importance M-MilCa.The method is improved by T-MilCa and introduce the number of covered instance in CCA to measure the selected instances’ importance. Experimental results on standard benchmark data sets and COREL image data sets demonstrate that M-MilCa can decrease the number of the selected instances and it is competitive with the state-of-the-art MIL algorithm.
Keywords/Search Tags:Machine Learning, Multi-Instance Learning, CCA, Instance Selection, Similarity Measure Function
PDF Full Text Request
Related items