Font Size: a A A

Research And Application Of Semi-supervised Hashing Algorithm Based On HAMA

Posted on:2015-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2268330428499824Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In content-based image retrieval (CBIR) applications, hashing-based approximate nearest (ANN) search is becoming more and more popular due to its computational and memory efficiency for online search. Semi-supervised hashing (SSH) framework minimizes empirical error over the labeled set and makes an information theoretic regularizer over both labeled and unlabeled sets. But the training of hashing function of this framework is so slow due to the large-scale complex training process. It needs a practical technology to accelerate its training process.With the promotion of parallel computing applications, mapreduce-based Hadoop framework has been widely used. But its computing model support is limited. Some researcher proposed how to build other parallel computing mode over mapreduce model. HAMA is a Hadoop top-level parallel framework based on Bulk Synchronous Parallel mode (BSP).Specifically, the main contributions are summarized as follow:1. We discuss how to use HAMA framework to develop BSP-based parallel computing program, and build and configure a HAMA cluster to verify its functionality and effectiveness. The experimental results show that the number of tasks per node is the number of CPU’s cores minus one. Efficiency of2-node cluster is1.77times than single node;2.43times when testing under3-node.2. Firstly, we analyze calculation of adjusted covariance matrix in the training process of SSH, secondly, we split it into two-parts:unsupervised data variance part and supervised pairwise labeled data part, eventually, we explore parallelization of it. Experiments show that for the large-scale data collection, our HAMA-based distributed parallel algorithm is still valid, and can be further applied to the massive data processing.3. We discuss hashing-based CBIR, including modules of feature selecting, training, and retrieval, finally design a CBIR system based on SSH. The experimental results show that the accuracy of system is69%, and compression efficiency reach one ten thousandth.
Keywords/Search Tags:Semi-Supervised Hashing, Bulk Synchronous Parallel mode, Distributed Computing, HAMA Framework, matrix computation
PDF Full Text Request
Related items