Font Size: a A A

Research On Algorithms And Application Of Multi-instance Learning

Posted on:2014-08-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:T ChenFull Text:PDF
GTID:1268330422481473Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With both the continuous increase in human being’s ability in collecting and storing dataand the rapid development of computer’s computing ability, the requirement of analyzing databy computers is more popular and more urgent than before, which makes the machinelearning techniques be more and more significant. In recent years multi-instance learning(MIL), a kind of new machine learning method, has become one of research hotspots inmachine learning field. MIL is different from the traditional supervised learning, unsupervisedlearning and the recent proposed semi-supervised learning method, so it is considered to be anew learning framework. In the framework of MIL, the training set consists of a number ofbags with labels, and each bag contains a number of unlabeled instances. A bag is labeledpositive if at least one instance in it is positive, or labeled negative if and only if all of itsinstances are negative. The objective of MIL is to construct a learning system which is learnedbased on the training bags can correctly predict the label of new bags.Because of the hierarchical representation structure of training samples, the MIL canrepresent the logical structure of some real problems more accurately than the traditionalsingle-instance-label representation can, which makes it has the unique superiority indistinguishing the so-called "ambiguity objects". Consequently, it has been widely applied tovarious applications, such as: drug activity prediction, image retrieval, image categorization,image annotation,text categorization, protein function prediction, web index page and link ofrecommendation, computer security, computer-aided diagnostics, and so on.Based on the analysis of research status and disadvantages of these existing MILalgorithms, this paper focuses on the research of the following issues of MIL, i.e., the relyingon a single instance, construction of bag features, reduction of bag features, parallelization ofMIL algorithms and their applications to image retrieval, image classification. Some goodexperimental results have been obtained and they can be summarized as follows:1. Aiming at the two disadvantages of existing MIL algorithms for image retrieval, i.e.the dependence on the presence of a single instance and high time-consuming, this chapterpresents an MIL and Bayesian classification based image retrieval method (MIL-Bayesian). In MIL-Bayesian, firstly, each image is divided into several regions, and the image is viewed asone bag in MIL, each region is regarded as an instance in corresponding bag. Secondly,calculate the diversity density (DD) function values of each region, and extract the possiblepositive region to compose a set, then estimate the class conditional probability densityfunction of positive regions using Gaussian mixture model. Thirdly, a Bayesian classifier isused to calculate the posterior probability of images with positive class label, and then theretrieved results are returned to user according to the posterior probability values indescending order. Finally, after several rounds of user relevance feedback, the user gets asatisfactory image. The experimental results on the Corel image set show that the proposedmethod has good retrieval precision and high retrieval efficiency.2. In order to narrow the semantic gap between low-level visual features and high-levelsemantic concepts in image categorization, this chapter exploits the clustering informationfrom a density clustering algorithm and the characteristics of multi-instance learningframework in distinguishing ambiguous object, proposes an image categorization methodusing density clustering on region feature and multi-instance learning, termed as DCRF-MILwhich treats image classification as a multi-instance learning problem. Firstly, it divides eachimage into a number of regions, re-lines up all regions into a collection, and then uses adensity clustering algorithm to learn the potential distribution information of region featuresin the collection. Secondly, it treats image as bag and regions as instances. Based on thecluster distribution information of region features, the bag is mapped into a vector in thecluster distribution space. Finally, a support vector machine classifier is constructed to predictthe class label of the unlabeled image. The experiments on the Corel image data set andMUSK molecular activity prediction data set show DCRF-MIL algorithm has highclassification accuracy and its parameters are easy to select.3. Aiming at the high-dimentional problem of bag features derived from transformationof instance space in MIL, this chapter proposes one multi-sub-space based ensemble MILalgorithms (MSEMIL) and its parallel version (P_MSEMIL). Firstly, this method determinedthe bag feature by mapping the bag in MIL into the instance space which is consist of allinstances; Secondly, by incorporating the bagging-based training samples selection methodand random feature subset selection method, the training set and test set is divided into multiple sub-spaces, and the semi-supervised learning is conducted in each sub-space toobtain one corresponding classifier. Consequently one ensemble classification system can beachieved by integrating the classification results of these multiple base classifiers. Finally, incluster computing systems, the P_MSEMIL is realized by using ProActive, which isJava-based distributed parallel computing middleware. The experimental results on MUSKand Corel data sets show that, compared with other similar algorithms, MSEMIL has higherclassification accuracy and better robustness to label noise. The experimental results alsoshow that the P_MSEMIL has a lower computation time, higher speed up ratio and othercharacteristics.
Keywords/Search Tags:Pattern classification, Multi-instance learning, Machine learning, Image retrieval, Image classification, Parallel algorithm
PDF Full Text Request
Related items