Font Size: a A A

Research On Multi-instance Prediction Algorithm Based On The Combination Of Clustering And Classification

Posted on:2017-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:S R GuFull Text:PDF
GTID:2308330482490590Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The main task of multi-instance learning is to learn some concept from the training set for correctly labeling unseen bags. Many theories and applications on multi-instance learning have been developed by researchers and achieved rich results during the past years. However, unsupervised multi-instance learning where training bags are without labels have not been thoroughly studied. In most cases, it is hard or costly to obtain labels of the bags, and unsupervised learning could help find the inherent structure of a data set, so it is necessary to study the unsupervised multi-instance learning algorithm.In the traditional supervised and unsupervised learning, each data object is represented by a single instance, while in the multi-instance learning, each object is composed of many bags each containing many instances. Previous works on multi-instance learning only deal with prediction tasks where each bag is associated with a binary or real-valued label, which can be understood as the traditional sense of the classification and regression problems. In order to solve the problem of unsupervised multi-instance learning, this paper developed a multi-instance prediction algorithm based on the combination of clustering and classification. The first step of algorithm is to cluster the training bags without label using the multi-instance clustering algorithm. Some form of distance metric is used to measure distances between bags, then adapting the popular k-means algorithm to partition the unlabeled training bags into k disjoint groups of bags.Secondly, based on the clustering results, each bag is re-represented by a k-dimensional feature vector, where the value of the i-th feature is set to be the average distance of all bags in the i-th group, and set the cluster categories as label. After that, bags are transformed into feature vectors so that common supervised algorithms are used to learn from the transformed feature vectors. This paper used three different classify algorithms.Extensive experiments show the availability of this algorithm and higher accuracy compared to other multi-instance prediction algorithm on a wide range of multi-instance prediction problems, including the standard and the generalized multi-instance models.
Keywords/Search Tags:multi-instance learning, training bag, k-means algorithm, SVM, Hausdorff distance
PDF Full Text Request
Related items