Font Size: a A A

The Desgin And Implementation Of A MAPREDUCE Based Distribute Programming Framework

Posted on:2012-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:L PanFull Text:PDF
GTID:2218330362451553Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a desktop product which has a large number of users, QQ Pcmgr has to handle massive user data. Process and analysis of the user data contribute a lot to the improvement of product quality. However, it is difficult for QQ Pcmgr back end routine jobs to apply for process resource, because the company's data process center has many critical process jobs which support the core business. In this situation, if we build an environment that makes full use of the spare process ability of the develop-servers to handle routine data process jobs of QQ Pcmgr, it'll be of great practical value.In the paper, by study the MapReduce architecture, we designed and implemented a MapReduce based distribute framework to simplify routine data process work. During the process, we focused on the analysis and design of fault tolerant and task schedule function. In the aspect of function testing, we use a QQ account's accelerate status report word count test program, we manually add delay to a certain process unit to simulate slow task in order to make sure the fault tolerant works. In the aspect of performance testing, we use two test programs, word count and record sort, to test the ability of massive data process.By doing all these work, the framework can be deployed on the develop-servers so that it can do QQ Pcmgr routine statistical and data process work, with good process ability, fault tolerance and scalability.
Keywords/Search Tags:Distributed Computing, Massive Data Process, MapReduce, Fault Tolerance, Classification of Intermediate Results
PDF Full Text Request
Related items