Font Size: a A A

A Research Of Image Retriveal Based On Lucene On The Cloud Computing Platform

Posted on:2015-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:Q L ChenFull Text:PDF
GTID:2308330464468851Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The era of Big Data is coming. One big challenge is to find out the useful information precisely from large amount of data. As image data becomes more and more important nowadays, quick retrieval of the similar images from a large-scale database(Content-based image retrieval, CBIR) has become an urgent problem to be solved.In this paper, we described an important distribution technique based on cloud computing platform, which improves the current CBIR method by reducing the retrieval time and increasing the accuracy. The new technique comprises the following aspects:(1) Analysis and improvement of the key technologies of current CBIR technique. The extraction of low-level visual features, the normalization of feature vectors, the similarity measurement, and the evaluation of criterion system are included.(2) Realization of the image retrieval system based on Lucene in local environment. This system can generate the index file within the frame of Lucene after extraction of the features of the images, thus retrieve similar images corresponding to the information inputted by user.(3) The thesis has set up a Hadoop for the distributed system. We have developed a high efficiency distributed system and the related service using Apache Hadoop.(4) The thesis has built an inverted index database for the large-scale image collections. Here, the color histogram, CLD, EHD and Gabor texture features were adopted to describe the pictures, and the inverted index was built with the Lucene. The efficiency of the index creation in distributed environment and local environment were compared.(5)Development of a new distributed image retrieval system based on the inverted index database. We can retrieval the input image with the established image characteristic index database. This distributed system has high fault tolerance and scalability. Experimental results show that the efficiency is much higher in distribution environment than in the single computer system.Taken together, by comparing the performance of the image retrieval system in the local environment and distribution environment, including the storage of images, the establishment of feathers retrieval feature index and the retrieval efficiency, we conclude that with the number of the images increased, the performance of single environment becomes worse, while the distributed system becomes better.
Keywords/Search Tags:Distribution, Mass Images, Inverted Index, Hadoop
PDF Full Text Request
Related items