Font Size: a A A

Research And Implementation Of LSH-based Large Scale Similar Image Retrieval Technology

Posted on:2017-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:W X ZhangFull Text:PDF
GTID:2428330569998718Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the exploding number of pictures on the Internet,image-related technologies and web applications are booming.Similarity image retrieval(SIR),retrieving similar images of images provided by users from an image database is one of the most popular applications.SIR has been widely used in many fields,such as search engines,E-commerce,social networking and biomedical,and has gained widely interest in the field of multimedia information retrieval.In general,the core issues of similar image retrieval include image feature retrieval,image feature matching and massive image storage.In this paper,we focus on the similar image retrieval optimization technique on the massive image,and have obtained the following research results:1)One of the biggest challenges of similar image retrieval technology is to quickly find out the images that meet the similarity requirements from massive number of images.In essense,image feature matching problem is a similarity search problem,which can be solved with approximate similarity search method.Among them,the local sensitive hashing(LSH)is the most widely-adopted solution.LSH can greatly improve query efficiency at the expense of negalectable precision loss.However,most LSH methods can be only executed in single-node environments and are slow to handle large-scale data.To solve this problem,this work designs and implements Spark-LSH,a distributed LSH algorithm based on Spark,which can index and query massive data.Further,this work proposes Efficient Spark-LSH,which improves the Spark-LSH using efficient indexing,location-aware query and a series of optimization.Experiments show that compared with Spark-LSH,Efficient Spark-LSH achieves 30% reduction of data shuffle and is over 100 times faster in query performance.2)It is challenging to store massive number of images efficiently.The storage module in image retrieval system affects greatly the stability and retrieval performance of the system.In this work,we propose an HBase-based solution for mass image storage,and compare the feasibility of HBase and HDFS as a storage scheme through theoretical analysis and experiment.Moreover,for the skewed image-accessing problem,this paper presents an HBase load balancing optimization scheme consisting a new regionServer selection algorithm and a data migration algorithm.Experimental results show that compared to the default load-balancing algorithm used by HBase,the load balancing optimization algorithm can reduce the maximum difference of data access frequency by 91.4%.3)Based on the proposed distributed LSH algorithm and the proposed HBase-based image storage,this paper designs and implements a highly modular large-scale similar image retrieval system.The system includes a web module,an image feature extraction module,an image feature matching module,a message middleware module and a storage module.Each module utilizes the most mature solution or cutting-edge technology in the industry.After the test under the real application environment,the system can meet the performance requirement of all scenarios.
Keywords/Search Tags:Image Retrieval, Image Storage, Similarity Search, Locality Sensitive Hashing, Load Balancing
PDF Full Text Request
Related items