Font Size: a A A

Design And Implementation Of Image Retrieval System Based On Hadoop Platform

Posted on:2020-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:X T LiFull Text:PDF
GTID:2428330590454820Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology,image data in the network grows rapidly.Accurate retrieval of large-scale images has become a very interesting issue,which also poses great challenges for researchers.Traditional image retrieval methods have been unable to meet the almost exponential growth of image data.To deal with the retrieval task of massive image data,distributed processing has become an important direction of image retrieval.In this paper,an image retrieval system based on Hadoop platform is proposed to realize image parallel retrieval.The following aspects will be studied:Image storage is the basis of the system.In this paper,image data is stored in HDFS in the form of serial files,and the image database of the system is established.Image feature extraction is very important in image retrieval.SIFT algorithm is an important feature extraction method in image retrieval,but it lacks color information.Therefore,this paper proposes a method that combines the CSIFT algorithm with MapReduce distributed programming model to realize parallel feature extraction and generate feature vectors with color information.In this paper,we optimize the BoVW model and use Canopy-Kmeans clustering algorithm combined with MaxMin criterion to generate visual dictionary.This algorithm makes the whole clustering process not doped with human factors.After quantifying the local features of CSIFT into word frequency vectors,the weights are processed to assign weight to each visual word.Thus,the ability of image description is increased,and inverted index is constructed on this basis.The optimized BoVW model is applied to MapReduce distributed programming model to complete parallel retrieval tasks.Experiments show that the average accuracy of the proposed algorithm is 24.5%,21.7% and 14.4% higher than that of the three algorithms(image parallel retrieval method for extracting image comprehensive features,image parallel retrieval method for combining SIFT with BoVW model,and image parallel retrieval method for combining SIFT with improved BoVW model)in the same data set retrieval task.Compared with single-machine image retrieval system,multi-node image retrieval system can shorten the retrieval time significantly.With the increase of experimental data,the advantages of cluster parallel processing are more obvious and the retrieval efficiency is improved significantly.According to the different needs of users,the number of Hadoop cluster nodes can be flexibly increased or decreased,so that the size of the system changes with the needs of users.The system designed in this paper has good scalability.
Keywords/Search Tags:image retrieval, parallel computing, CSIFT features, visual dictionary
PDF Full Text Request
Related items