Font Size: a A A

Design And Implementation Of A Dump Module Based On The Hadoop Platform

Posted on:2013-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z K HuangFull Text:PDF
GTID:2248330374499075Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet industry, user-related information and data showing a massive growth trends, export of valuable data, analysis and processing has become a topic of major companies are facing.Traditional data export from the database used the stand-alone Dump,The database table associated usually completed by the Server side, the Client side is responsible for the data obtained for further analysis and processing, however, with the company’s business development and the explosive growth,this stand-alone version of the method has been unable to adapt to the system performance requirements,To some extent,it becomes the bottleneck and restricted the development of business,It’s necessary to develop a more rational framework to replace.Hadoop is a distributed system infrastructure developed by the Apache Foundation, it is a software framework for distributed processing a large amount of data, Users need not to understand the distributed low-level details of the case, but to develop distributed programs. Take full advantage of the power of high-speed computing and storage of the cluster. Hadoop implements a distributed file system, referred to the HDFS. HDFS has a high fault tolerance features, and designed to deploy in a low-cost hardware. It provides a high transfer rate to access the application data, suitable for applications with large data sets.Based on The Taobao advertising business background,this article analyzed the main problems and bottlenecks faced by the current data in the Dump and follow-up process from the perspective of business applications, summarized Hadoop platform program development technologypoints. On this basis,this article decomposed the entire task into several functional modules, given its Hadoop platform solutions, completed the design and structure of the program all the code implementations. Not only does it solve the various problems faced by stand-alone Dump and makes the whole system has a better stability, greater scalability and ease of maintenance, and be able to cope with rapid businessdevelopment and large-scale data growth needs in a long period of time.In the final part,this article analysed the underlying mechanisms and principles of the Hadoop platform systematically, the parameter tuning for online system reduced the load of equipment effectively and achieved good results.
Keywords/Search Tags:Dump, Data Processing, Distributed System, Hadoop
PDF Full Text Request
Related items