Font Size: a A A

Performance Evaluation And Optimization For BigData As A Service

Posted on:2017-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:H N ZhengFull Text:PDF
GTID:2428330590988882Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the Age of Big Data,new challenges for data processing are presented.Lots of Big Data frameworks spring up,(e.g.Hadoop,Storm),becoming effective tools to handle various Big Data needs.Meanwhile,cloud computing is promoted greatly,people could easily access resources from cloud on-demand and pay as they use.Combined with the two aspects,the concept of Big Data as a Service(BDaaS)comes up.The idea is taking the whole life cycle of Big Data processing as a cloud service,users can focus on the business,while cloud will prepare the elements needed,such as cluster,framework and data source.Various BDaaS solutions appears in recent years,(e.g.AWS EMR,Openstack Sahara),and BDaaS does bring benefits,easy to use,low cost,elastic and so on.However,people also encounter many new problems in practice.Performance is one of the most concerned problems.People often find the Performance of their BDaaS is far from expected,and can't meet demands.And sometimes,stratigies used in traditional environment doesn't work in BDaaS.People urgenly seek related instructions,but there is a lack of systematic research in this area currently.For the performace issues in BDaaS,this paper performs a systematic research on evaluation and optimization.Openstack,Hadoop and Sahara are picked as the specific case,and our work mainly includes three aspects: First,we anaylze the performance issues in BDaaS,we take the resource management and use as the crucial factor,which changes from physical environment to cloud and varies among different scenarios and workloads.Focusing on resource,we build an analytical model to describe the BDaaS dataflow in various scenarios and use it to estimate system's performance.Then,based on the model,we perform the evaluations through five types of workloads on Openstack and Sahara.We develop an automatic-test tool named Doopshot to do those tests,about 10 perforamance factors are found.Finally,we propose three aspects of strategies on Sahara to achieve better resource management and use,these ideas can also apply to other BDaaS solutions besides Hadoop on Sahara.At last,we evaluate the efficiency of our optimizations.We build 6 cases including traditional Hadoop,original Sahara and our optimizations.The experiments show a huge performance boost by using our strategies.Compared to original Sahara,our oprimizations' throughput in DFSIO is increased by 120%.In the memory management solution,the throughput rises 13-fold by using Tachyon.In Sort,all the optimizations' execution time reduces by half and the memory utilization grows from 80% to 96%.However,some problems needed to be optimized show up,such as cache isolation.In general,the results are as expectd,which verifies our methodology about BDaaS.
Keywords/Search Tags:Big Data, Cloud Computing, Performance, BDaaS, Hadoop, Openstack Sahara
PDF Full Text Request
Related items