Font Size: a A A

The Research And Implementation Of Interdata Storage And Transaction Optimization On Mapreduce Cluster Engine

Posted on:2013-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q NaFull Text:PDF
GTID:2268330392469555Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In a computer system, cluster is a group of softwares and hardwares that act like asingle system and complete computing workload in high collaboration. MapReduce is asoftware architecture designed by Google, and it is a distributed computation model thatis used for processing large-scale data sets. MapReduce engine can run on a clustersystem, and can automatically dispatch tasks to cluster and process jobs in parallel, andcompute the workload by the powerful performance of clusterIn this study, we discussed the problem that interdata shuffling between each nodeof MapReduce cluster engine developed by Platform Computing Corporation, andproposed a solution that can optimize the interdata storage and transaction. In thissolution, we implemented the interdata storage function and interdata transactionoptimizition function to improve the system performance. In the pre-condition ofensuring data integrity, this paper solves the problem of sharing and shufflinglarge-scale interdata between nodes. This study committed to improve the low disk I/Operformance and the multi-thread concurrent problem caused by large-scale interdata.As a result, the performance of processing job is improved.In this paper, because of difference of the priority and method of processinginterdata in map task and reduce task, interdata storage module is devide into twodifferent modules called interdata storage module of map task and interdata storagemodule of reduce task. Interdata transaction optimizition module optimized the interdatatransaction speed between nodes. Interdata storage modules of Map task and Reducetask have the same thinking, but the implement is different. When storing interdata,these two modules map the file to memory and use the memory to cache the data, andthen maintain the data in memory. Interdata transaction optimizition module depends oninter data storage module, and sends the interdata in memory from map nodes to reducenodes more quickly, and then saves time on processing jobs.As the test results of this paper, the optimization of interdata storage andtransaction improved the ability to handle large-scale data processing task, andsignificantly decreased the time of running jobs. And base on result of the industrystandard Hadoop benchmark test, the conclusion is that the feature implemented in thispaper improved the cluster performance about20%-60%on common hadoopapplications that have different workload type.
Keywords/Search Tags:Cluster, MapReduce, Disk I/O Optimizition, Interdata TransactionOptimizition
PDF Full Text Request
Related items