Font Size: a A A

Evaluating Mapreduce Performance In RozoFS Storage Clusters

Posted on:2017-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2348330536453094Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Hadoop is a distributed software framework which is capable of processing large amounts of data.Hadoop underlyingly uses MapReduce parallel computing models and HDFS distributed storage model.This model divides the large data processing into Map and Reduce process.It greatly simplifies the processing logic of large data application.The underlying Hadoop HDFS distributed storage model adopts default replication mechanisms to ensure full data redundancy.It is possible to restore the original data in case the data is wrong or lost.However,it will greatly consume storage space when the the processing data is very large.It will also consume network bandwidth when the data is written to HDFS.The redundant coding distributed storage model,such as RozoFS based storage system,could significantly save storage space and network bandwidth while it ensures the same fault tolerance.Based on the above considerations,this paper focus on the MapReduce computation model based on RozoFS distributed file storage system to assess the MapReduce application performance in RozoFS storage cluster.This thesis studied RozoFS distributed file storage system,the SimGrid Distributed Simulation Platform,the MapReduce computation model and other related technologies.on these bases,we designed and implemented the MapReduce application performance simulation system under RozoFS storage cluster based on the SimGrid platform.This system includes the network topology and simulation system parameter configuration functions,associated data placement and task scheduling logic functions,in order to simply the study and performance evaluation of MapReduce applications in RozoFS storage cluster.Then we built Hadoop and RozoFS to extract relevant simulation parameters,and executed the typical TeraSort and WordCount MapReduce experiments in simulation system,assessed the impact of different configuration parameters on the application execution.Finally,the paper proposes RozoFS storage cluster system configuration optimization solution based on an assessment of the experiment,the experimental results show that the solution improve the executing performance of MapReduce applications in RozoFS storage cluster system.
Keywords/Search Tags:MapReduce, Erasure code, RozoFS, Simulation
PDF Full Text Request
Related items