Font Size: a A A

A Research And Implementation With Improved K-Means Clustering Algorithm To Image Retrieval System Based On Hadoop Platform

Posted on:2015-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:G P LiFull Text:PDF
GTID:2268330428961571Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Contemporary people’s life has entered The Mobile Internet era. People’s life and study and some other aspects have benefited a lot from the popularization and widespread application of various mobile internet devices. At the same time,lots of information from every walk of life are digitized and accumulating in the form of mutimedia information. As one of the most basic multimedia information,The image is easy to be understood and used,People’s demands for The Image Retrieval are also developed from the begining of Retrievaling according to the text description to Retrievaling similar images according the image contentThe Image Retrieval has been a research hotspot in the field of computer science, it can be divided into Text-Based Image Retrieval and Content-Based Image Retrieval. The primary content of this paper is how to do research on The Content-Based Image Retrieval and implementation with huge amounts of images by using Big Data Technology.From the aspect of data analysis,a Content-Based Image Retrieval system must figures out two principal problems which are The Storage and Rapidly Processing of a large number of image data. We will use Hadoop technology dedicated to The Storage and Processing of Big Data to store huge amounts of image data and proceed off-line distributed computing; from the aspect of Retrieval technology analysis,we need to proceed feature extraction and processing,in this paper,we would extract the images’SIFT features,and then cluster these traits with K-Means clustering,next that,quantify all the SIFT features by using Bag-of-Words Model with the bag-of-visual-words which is the clustered result forward,so we can present a image with a fixed dimension feature vector,besides,dispose these vectors using TF-IDF weighted technology,Finally,calculate the similarity between these images’ vectors and retrieval image vector,return several images with smallest similarity.This article would use and modify HIPI-Hadoop Image Processing Interface to calculate with image type on Hadoop and store them,proposed a revised parallel K-Means algorithm and applied it to the clustering of feature points,A similarity calculation method based on area would be used to calculate the degree of similarity between image feature vectors.In order to adapt to the requirements of The Big Data Processing we have the source code of Mahout improved.The Image Retrieval has a widespread application,the research to The Image Retrieval System based on Hadoop will play a guiding role for the development of Image Retrieval Technology in Big Data Era.
Keywords/Search Tags:Image Retrieval, Big Data, Storage, Distributed Computing, Hadoop, HIPI, K-Means
PDF Full Text Request
Related items