Font Size: a A A

Research And Implementation Of Design Patent Images Retrieval System Based On Hadoop

Posted on:2012-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:X W WangFull Text:PDF
GTID:2178330335974234Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Traditional retrieval on design patent is based on the mode of text query, it can not fully utilize the abundant visual information included in the design patent images in large scale retrieval, and the similarity of images is mainly based on artificial recognition with heavy workload and low efficiency consequently. To solve the problems existed in the traditional retrieval mode, we describe design patent images utilizing the character about shape, texture and colour of images by the content-based retrieval technology. According to the character of images and the discrimination of similarity provided by the image patent database, we make the design patent image query and retrieval becoming automatical and improve the retrieval speed and accuracy of design similarity judging.However, image retrieval, which is a data-intensive computing process, consume high CPU usage rate when do the image retrieving.Low retrieval speed, bad concurrency, low efficiency of processing mega data exist in the image retrieval system in single node of B/S structure along with high speed increasing of the design patent quantity.This paper propose a Hadoop-based image retrieval method on design patent according to the analysis of the system and apply the content-based retrieval technology to MapReduce parallel computing structure, store the design patent images and their character database in HDFS.When the retrieval jobs processing in Hadoop distributed system, the system split the character database of patent images into several data modules and transmission them to Map tasks in each computing node of HDFS. Map tasks read the character data in the form of key/value and extract the character of shape, texture and colour, then, do the similarity matching computing with the character in database.Computing results also output in the form of key/value. Reduce tasks receive all the Map tasks computing results, order them by similarity and output the image retrieval results.This process presents a distributed computing of image retrieval.Building up a Hadoop distributed environment with bargain price PCs and running the retrieval application program in the distributed system compared with the retrieval system in existence, this method can balance the system loading, improve the utilizing rate of resource, reduce the time of retrieving mega data collection efficiently and analyse the loading balancing, reliability and scalability of the Hadoop distributed system.
Keywords/Search Tags:Hadoop, MapReduce, Design Patent, Image Retrieval, Distributed Computing
PDF Full Text Request
Related items