Font Size: a A A

Research On Hashing Methods Based On Multi-Instance Data Retrieval

Posted on:2017-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2308330485982200Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Recent years, with the rapid development of the Internet and the mobile Internet, the amount of data grows exponentially, and the types of data also increase greatly. How to retrieve similar examples quickly has attracted interest from many scholars. With continuous development of machine learning techniques, More and more people use its to solve complex problems in practice. Therefore, we also utilize machine learning methods to deal with massive data retrieving problem.Multi-instance learning (MIL) has been widely applied in scene classification. The advantage of multi-instance data is that it is more natural and informable than single instance representation but it also makes the dataset become enormous. In many scenarios, given a sample, we need to perform similarity search from multi-instance dataset. Traditional kernel methods that compute similarity between bags in original feature space are difficult to be used for large scale dataset due to their high cost of computation and storage.Recently, due to the advantage of computing and storage, hashing method has attracted much attention of many scholars. It maps each example from original space to a low hamming space to obtain compact binary code by keeping the example similarity in original space and hamming space. The resulting binary codes enable fast similarity search on the basis of the hamming distance. Moreover, compact binary codes are extremely efficient for large-scale data storage. Thus, hashing method can tackle the two challenges perfectly by its time and space characters.This thesis focuses on how to design hashing methods for the multi-instance data search problem. Specifically, from bag and instance levels, two multi-instance hashing methods are proposed:1) Bag-level Multi-Instance Hashing (BMIH). BMIH first generates a set of cluster centers by clustering method in the instance feature space; then, a feature fusion method is proposed to transform all bags to a new feature representation. At last, supervised hashing method is proposed to hash feature into binary codes.2) Instance-level Multi-Instance Hashing (IMIH). To use more instance information in each bag, IMIH considers all instances in all bags as training data and utilizes two types of hash learning methods (unsupervised and supervised) to convert all instances to binary codes. For a query bag, a metric technique is proposed to measure the similarity among bags by computing the hashing codes of instances.However, it has been shown that the hashing quality could be boosted by leveraging supervised information into hash learning. In the multi-instance data, it is known that a negative bag does not contain any positive instance. Thus, we can regard all the instances in negative bags as labeled negative instances. On the other hand, since a positive bag may contain positive as well as negative instances, we can regard its instances as unlabeled ones. Based on this, IMIH problem can be viewed as a semi-supervised learning. For semi-supervised method, only the negative label are used, which does not use the labeled information in positive bags. To tackle this, instance selection method is used to exploit more label information in positive instances and apply it to hash function learning.The two proposed methods are tested in published datasets, the experimental results show that instance-level hashing with supervised information achieves better results. Moreover, we also compare our method with the traditional multi-instance kernel method. The results show that multi-instance hashing methods are inferior to kernel method on the precision of top retrieved samples, but their search speeds are much faster than kernel method.
Keywords/Search Tags:Multi-instance learning, Learning to hash, Image retrieval
PDF Full Text Request
Related items