Font Size: a A A

Research On Hadoop Based Iterative Data Processing And Data Placement Strategy

Posted on:2015-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZouFull Text:PDF
GTID:2298330467474600Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Hadoop is distributed computing framework which has been widely used for dealing with BigData. Especially in such a large number of Internet users environment. But Hadoop has somedisadvantages to process some iterative application,such as graph data. Because of strong coupling,graph data need multiple iterations which may contains several mapreduce computations instead ofone mapreduce computation. It costs too much to restart mapreduce job and exists unnecessarystatic data. It destroys the origin thought of Hadoop.This paper first introduces the Hadoop processing iterative application problems and somesolutions, such as HaLoop. Based on this, a map side storage strategy is proposed by extending themapper interfaces to make full use of the calculation of the map phase and the store static data onmap side, in order to finish some related computations with static data and state data, thus reducingthe amount of data transfer for reduce phase calculation.After the analysis in the iterative processing applications, Hadoop’s default placement policyhas some defects. It does not consider the backup placement and computer ability difference.Minimizing the overhead of data backup and recovery and make full use of the ability differencesto avoid cost of repeating data transmission is feasible method. Take node’s storage capacity,processing ability and network delay into account to select the appropriate storage node.Finally, through constructing the Hadoop platform, and fixing source code, recompile, andmaking comparison with prior schemes to verify the correctness of the theoretical analysis ofefficiency.
Keywords/Search Tags:Hadoop, iteration, map side storage, static data, placement strategy
PDF Full Text Request
Related items