A Research And Implementation With Improved K-Means Clustering Algorithm To Image Retrieval System Based On Hadoop Platform

Posted on:2015-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:G P Li

Full Text:PDF

GTID:2268330428961571

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Contemporary people’s life has entered The Mobile Internet era. People’s life and study and some other aspects have benefited a lot from the popularization and widespread application of various mobile internet devices. At the same time,lots of information from every walk of life are digitized and accumulating in the form of mutimedia information. As one of the most basic multimedia information,The image is easy to be understood and used,People’s demands for The Image Retrieval are also developed from the begining of Retrievaling according to the text description to Retrievaling similar images according the image contentThe Image Retrieval has been a research hotspot in the field of computer science, it can be divided into Text-Based Image Retrieval and Content-Based Image Retrieval. The primary content of this paper is how to do research on The Content-Based Image Retrieval and implementation with huge amounts of images by using Big Data Technology.From the aspect of data analysis,a Content-Based Image Retrieval system must figures out two principal problems which are The Storage and Rapidly Processing of a large number of image data. We will use Hadoop technology dedicated to The Storage and Processing of Big Data to store huge amounts of image data and proceed off-line distributed computing; from the aspect of Retrieval technology analysis,we need to proceed feature extraction and processing,in this paper,we would extract the images’SIFT features,and then cluster these traits with K-Means clustering,next that,quantify all the SIFT features by using Bag-of-Words Model with the bag-of-visual-words which is the clustered result forward,so we can present a image with a fixed dimension feature vector,besides,dispose these vectors using TF-IDF weighted technology,Finally,calculate the similarity between these images’ vectors and retrieval image vector,return several images with smallest similarity.This article would use and modify HIPI-Hadoop Image Processing Interface to calculate with image type on Hadoop and store them,proposed a revised parallel K-Means algorithm and applied it to the clustering of feature points,A similarity calculation method based on area would be used to calculate the degree of similarity between image feature vectors.In order to adapt to the requirements of The Big Data Processing we have the source code of Mahout improved.The Image Retrieval has a widespread application,the research to The Image Retrieval System based on Hadoop will play a guiding role for the development of Image Retrieval Technology in Big Data Era.

Keywords/Search Tags:

Image Retrieval, Big Data, Storage, Distributed Computing, Hadoop, HIPI, K-Means

PDF Full Text Request

Related items

1	Research On Distributed Storage And Retrieval System Based On Hadoop For Massive Video
2	Design And Implementation Of Distributed Data Storage Based On Hadoop
3	Research On Key Technology Of Massive Image Retrieval Based On Hadoop
4	Research And Application Of Big Data Retrieval Based On Cloud Computing
5	Research On Image Retrieval Algorithm Based On Hadoop
6	Research And Design Of Massive Image Cloud Storage System Based On Hadoop
7	Design And Analysis Of The Mass Image Storage Model Based On Hadoop
8	The Research And Design Of Distributed Data Mining System Based On Hadoop
9	Hadoop Based Distributed Storage And Algorithm Analysis For Mass Futures Data
10	Research And Implementation Of Integration Of R Language And Hadoop