Font Size: a A A

Optimization Of A Hadoop-Based Design Patent Images Retrieval System

Posted on:2016-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y H YuFull Text:PDF
GTID:2308330461956007Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years, the disputes around design patent are increasing, and many products look very similar. This situation not only to manufacturers in the design of new product is very difficult, but also to the importance of design has risen to unprecedented heights. With the strengthening of enterprise property rights awareness, the demand and application of design patent information are getting more and more widely.For a patent image retrieval system, the real-time and stability of the retrieval system is not guaranteed with the amount of mass increasing. Under the background of today’s big data era, the patent number outbreak of type growth to retrieval performance of the system presents unprecedented requirements, and cloud computing technology is the perfect answer to this problem. The cloud computing platform based on Hadoop has the high reliability, high efficiency and high expansibility, and is the ideal solution for many Internet companies and institutions for large-scale data processing.In the distributed environment, each node in the cluster can is composed of different configurations of the machine, inevitable will each node of the performance gap bigger, or due to network failure leads to abnormalities of topological structure. However, the MapReduce task scheduling mechanism and the HDFS storage strategy in Hadoop are lower in the performance of complex heterogeneous application scenarios, and have become the bottleneck of mass data processing. The addresses based on Hadoop platform design patent image retrieval system which is a concrete, in default of MapReduce task scheduling mechanism and HDFS storage strategy in practical application scene performance is poor, respectively, are proposed optimization measures, improve the performance of the system. The main work is as follows:(1)The design principle and execution flow of Hadoop framework, HDFS and MapReduce are deeply analyzed.(2) Aiming at the problem of the actual use of the Hadoop platform in the actual use of the image patent retrieval system, the improvement of the LASE task scheduling strategy and the HIFI storage strategy are improved respectively.(3) The optimization strategy of the image retrieval for the Hadoop platform is verified by experiments.The experimental results show that the optimized system can improve the performance of the Hadoop platform, and reduce the response time of the users request.
Keywords/Search Tags:Hadoop, Distributed Systems, MapReduce Scheduling Mechanism, HDFS Storage Strategy, Design Patent Images
PDF Full Text Request
Related items