Research On Hadoop Based Iterative Data Processing And Data Placement Strategy

Posted on:2015-08-02

Degree:Master

Type:Thesis

Country:China

Candidate:L L Zou

Full Text:PDF

GTID:2298330467474600

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Hadoop is distributed computing framework which has been widely used for dealing with BigData. Especially in such a large number of Internet users environment. But Hadoop has somedisadvantages to process some iterative application,such as graph data. Because of strong coupling,graph data need multiple iterations which may contains several mapreduce computations instead ofone mapreduce computation. It costs too much to restart mapreduce job and exists unnecessarystatic data. It destroys the origin thought of Hadoop.This paper first introduces the Hadoop processing iterative application problems and somesolutions, such as HaLoop. Based on this, a map side storage strategy is proposed by extending themapper interfaces to make full use of the calculation of the map phase and the store static data onmap side, in order to finish some related computations with static data and state data, thus reducingthe amount of data transfer for reduce phase calculation.After the analysis in the iterative processing applications, Hadoop’s default placement policyhas some defects. It does not consider the backup placement and computer ability difference.Minimizing the overhead of data backup and recovery and make full use of the ability differencesto avoid cost of repeating data transmission is feasible method. Take node’s storage capacity,processing ability and network delay into account to select the appropriate storage node.Finally, through constructing the Hadoop platform, and fixing source code, recompile, andmaking comparison with prior schemes to verify the correctness of the theoretical analysis ofefficiency.

Keywords/Search Tags:

Hadoop, iteration, map side storage, static data, placement strategy

PDF Full Text Request

Related items

1	Research On Hadoop Based Data Placement Strategy
2	Load-balanced Placement Strategy For Big Data Storage
3	Research On Optimization Of Big Data Storage Replica Strategy In Cloud Environment
4	Data Placement Strategy For Data-intensive Applications In Cloud Storage System
5	Research On Strategy Of Data Replica Placement For Geo-distributed Cloud Storage Services
6	Research And Implementation Of Data Placement Strategy In Cloud Computing
7	Research And Implementation On Data Placement And Load Balance Strategy For Multicloud Storage System
8	Memory-based Data Storing Technologies On Hadoop Distribution File System
9	The Research On The Optimized Placement Strategy And Fault Tolerant Storage In Data Center Network
10	On The Hadoop Based Distributed Storage Techniques And Its Applications In Content Dissemination Design