Font Size: a A A

Research And Implementation Of Image Retrieval Based On BoVW Model In The Hadoop Platform

Posted on:2018-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:B P ZhuFull Text:PDF
GTID:2428330596452985Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Nowadays the booming development of computer technology makes the number of image increase and the semantic information of image become richer,causing that the efficiency of image retrieval is becoming more and more prominent.In the background of big data,aiming at the problem of low efficiency in image retrieval based on the traditional single node architecture,based on the characteristics of BoVW model which is simple in principle and good in performance,the Hadoop platform is used to achieve a distributed image retrieval because of its excellent data processing,good scalability and reliability.The main research work of this paper is as follows.(1)Aiming at the problem that Hadoop is not suitable to deal with a large number of image files,this paper combines many small image files into a sequence file to improve the performance of Hadoop image processing.Because the local feature extraction of image is complicated and time-consuming,the parallelization of SIFT feature extraction is realized.(2)The construction of visual dictionary is the key of BoVW model.Aiming at the problem of low efficiency in traditional visual dictionary construction method,this paper improves the problem from three aspects.First of all,the local sensitive hash function has a good ability to maintain high dimensional data similarity in large data mining,which is used to divide the massive high dimensional feature vector.The datum is reduced by selecting the sample points from the partition.Secondly,in order to improve the quality of initial center points,the parallel maximum-minimum distance algorithm is used to optimize the selection of central points.Finally,when K-means algorithm is in the iteration,the Combine function is used to merge the intermediate results to reduce the transmission and computation between Map nodes and Reduce nodes.The experimental results show that,compared with traditional visual dictionary construction method,the improved parallel method can achieve the same retrieval effect,but it can double the efficiency of construction.(3)Due to the different importance of each visual word to the image in the process of feature quantization,the parallelization of TF-IDF algorithm based on Hadoop is implemented in the paper.By weighting the word frequency vector of the image,the description ability of the model is improved.The experimental results show that compared with BoVW model,the weighted BoVW model can improve the accuracy of image retrieval.(4)Aiming at the complex problem of similarity computation between massive high-dimensional sparse vectors,the paper designs a parallel image retrieval method based on inverted index.By calculating the sum of the weights of the visual words commonly contained in two images in parallel by inverted index file,the similarity between the images is obtained.The proposed method not only reduces the number of candidate image sets by inverted index technology,but also greatly improves the efficiency of image retrieval by parallel search.If the number of cluster nodes continues to increase,the efficiency of image retrieval will be higher.
Keywords/Search Tags:Hadoop, image retrieval, BoVW model, TF-IDF, inverted index
PDF Full Text Request
Related items