Font Size: a A A

Pretreatment Design Patent Image Retrieval Method Based On Map-Join-Reduce

Posted on:2013-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:J W ZhuFull Text:PDF
GTID:2248330371981080Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The patent design images contain lots of visual information. It is a heavy task and a low efficient work through by artificial identification in image retrieval.However, image retrieval, which is a data-intensive computing process, consume high CPU usage rate when image retrieving.So the system import the Hadoop structure which is a distributed computing model. Compared with the B/S single-node image retrieval system, retrieval efficiency has been improved. Combining MapReduce paiallel computing framework, image retrieval technology solves many problems, such as the low instantaneity and concurrency problems caused by high system load and low computing volume problem.In handling multiple data sets, MapReduce can not gather all of data sets, and every MapReduce intermediate result must be checked and shuffled to avoid producing errors, which is the bottleneck of the real-time system. Map-Join-Reduce, the extensional programming model preprocessing method, can simplify and efficiently process complex data analysis tasks and retrieve the result faster. The Map-Join-Reduce is an extention of the programming model of MapReduce, which fix for mixing operation processing on multiple data sets. MapReduce and Map-Join-Reduce can be chained through out their inputs and outputs. The system uses the Map-Join-Reduce mixing the image feature data and bibliographic information data through by distributed processing, which can simplify the data amount, compress the retrieval volume and improving the efficiency of retrieval.The map task of Map-Join-Reduce find out the specific bibliographic record in this two tables, then join task use the patent number in this two tables as the bridge to join up to be a big table, and last, reduce task sort all the records and get them output.Then the result will be sent to the chained MapReduce program to further retrieval.The experimental result shows that, this method, like the original MapReduce system method, can get the system load balance, improve resource utilization, effectively reduce the time of image retrieval in large data sets and promote the retrieval efficiency further relative to MapReduce-based retrieval method.
Keywords/Search Tags:Cloud Computing, Hadoop, Map-Join-Reduce, Distributed process, Image Retrieval
PDF Full Text Request
Related items