The Research And Implementation Of Interdata Storage And Transaction Optimization On Mapreduce Cluster Engine

Posted on:2013-05-07

Degree:Master

Type:Thesis

Country:China

Candidate:Q Na

Full Text:PDF

GTID:2268330392469555

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In a computer system, cluster is a group of softwares and hardwares that act like asingle system and complete computing workload in high collaboration. MapReduce is asoftware architecture designed by Google, and it is a distributed computation model thatis used for processing large-scale data sets. MapReduce engine can run on a clustersystem, and can automatically dispatch tasks to cluster and process jobs in parallel, andcompute the workload by the powerful performance of clusterIn this study, we discussed the problem that interdata shuffling between each nodeof MapReduce cluster engine developed by Platform Computing Corporation, andproposed a solution that can optimize the interdata storage and transaction. In thissolution, we implemented the interdata storage function and interdata transactionoptimizition function to improve the system performance. In the pre-condition ofensuring data integrity, this paper solves the problem of sharing and shufflinglarge-scale interdata between nodes. This study committed to improve the low disk I/Operformance and the multi-thread concurrent problem caused by large-scale interdata.As a result, the performance of processing job is improved.In this paper, because of difference of the priority and method of processinginterdata in map task and reduce task, interdata storage module is devide into twodifferent modules called interdata storage module of map task and interdata storagemodule of reduce task. Interdata transaction optimizition module optimized the interdatatransaction speed between nodes. Interdata storage modules of Map task and Reducetask have the same thinking, but the implement is different. When storing interdata,these two modules map the file to memory and use the memory to cache the data, andthen maintain the data in memory. Interdata transaction optimizition module depends oninter data storage module, and sends the interdata in memory from map nodes to reducenodes more quickly, and then saves time on processing jobs.As the test results of this paper, the optimization of interdata storage andtransaction improved the ability to handle large-scale data processing task, andsignificantly decreased the time of running jobs. And base on result of the industrystandard Hadoop benchmark test, the conclusion is that the feature implemented in thispaper improved the cluster performance about20%-60%on common hadoopapplications that have different workload type.

Keywords/Search Tags:

Cluster, MapReduce, Disk I/O Optimizition, Interdata TransactionOptimizition

PDF Full Text Request

Related items

1	The Design And Implementation Of A MapReduce Computing Framework Based On GPU Cluster
2	The Design And Implementation Of USB Disk-based Server-Cluster Deployment And Monitoring System
3	Research On Low Power Scheduling Technology For Heterogeneous Cluster Based On MapReduce
4	Research Of Gpu Cluster-Based Mapreduce Programming Model
5	Optimization Of Network And Scheduling For MapReduce In Heterogeneous Cluster
6	An investigation of circumstellar disk properties in cluster environments
7	Research And Implementation Of Disk Backup And Restore Operations On Windows 9x/NT Operating Systems
8	The Optimization Of High Performance MapReduce FairScheduler And The Implementation On Simulator Of Huge Scale Cluster
9	An Optimized MapReduce Workfow Scheduling Algorithm For Heterogeneous Computing
10	Research On Performance Optimization Based On MapReduce