Research And Application Of Semi-supervised Hashing Algorithm Based On HAMA

Posted on:2015-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:2268330428499824

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

In content-based image retrieval (CBIR) applications, hashing-based approximate nearest (ANN) search is becoming more and more popular due to its computational and memory efficiency for online search. Semi-supervised hashing (SSH) framework minimizes empirical error over the labeled set and makes an information theoretic regularizer over both labeled and unlabeled sets. But the training of hashing function of this framework is so slow due to the large-scale complex training process. It needs a practical technology to accelerate its training process.With the promotion of parallel computing applications, mapreduce-based Hadoop framework has been widely used. But its computing model support is limited. Some researcher proposed how to build other parallel computing mode over mapreduce model. HAMA is a Hadoop top-level parallel framework based on Bulk Synchronous Parallel mode (BSP).Specifically, the main contributions are summarized as follow:1. We discuss how to use HAMA framework to develop BSP-based parallel computing program, and build and configure a HAMA cluster to verify its functionality and effectiveness. The experimental results show that the number of tasks per node is the number of CPU’s cores minus one. Efficiency of2-node cluster is1.77times than single node;2.43times when testing under3-node.2. Firstly, we analyze calculation of adjusted covariance matrix in the training process of SSH, secondly, we split it into two-parts:unsupervised data variance part and supervised pairwise labeled data part, eventually, we explore parallelization of it. Experiments show that for the large-scale data collection, our HAMA-based distributed parallel algorithm is still valid, and can be further applied to the massive data processing.3. We discuss hashing-based CBIR, including modules of feature selecting, training, and retrieval, finally design a CBIR system based on SSH. The experimental results show that the accuracy of system is69%, and compression efficiency reach one ten thousandth.

Keywords/Search Tags:

Semi-Supervised Hashing, Bulk Synchronous Parallel mode, Distributed Computing, HAMA Framework, matrix computation

PDF Full Text Request

Related items

1	Distributed Graph-parallel Framework Scheduling Analysis And Optimization
2	The Research Of Distributed Parallel Support Vector Regression Machine Algorithm And Framework
3	Research And Implementation Of Multi-layer Job Scheduling Algorithm Based On Hama Parallel Computing Framewokr
4	Research On Semi-supervised Cross-modal Hashing Retrieval Algorithm
5	Bootstrap Dual Complementary Hashing With Semi-supervised Re-ranking On Large Scale Image Retrieval Problem
6	Parallel Algorithms Research Based On Hadoop And Hama
7	Performance Optimization Of Distributed Graph Computation Framework Based On BSP Model
8	Supervised Hashing Methods For Information Retrieval
9	Study And Implementation On Distributed Large Scale Matrix Computation Algorithms With Spark
10	Study On Performance Of Hama Computing Platform