Font Size: a A A

Automatic deploy hadoop cluser on Amazon elastic compute cloud

Posted on:2016-07-25Degree:M.SType:Thesis
University:University of DelawareCandidate:Chen, HaoFull Text:PDF
GTID:2478390017979167Subject:Computer Engineering
Abstract/Summary:
With the explosive amount of data generated everyday, Big-Data is becoming one of the most popular topics today which receives both research and business attention. Hadoop, which was built based on the Google proposed algorithm MpaReduce, was first introduced by Doug Cutting and his group in 2005. Then it became an Apache project in 2008 and its improved second version was released in 2012. Hadoop has dominated the Big-Data framework area that it is considered as the first choice for most companies and research groups. To deploy a Hadoop cluster, we need to build a computer cluster with a number of nodes. It might not be affordable for small business and research groups with limited funding, so the cloud computing service becomes the alternative. AWS(Amazon Web Service) is one of the most popular cloud providers due to their high quality service and more affordable price. However, building a Hadoop cluster on AWS manually is time consuming due to various factors. Firstly, node information is not fixed when a new set of nodes is requested. There is also extra work to log into all instances to edit their configuratons, it becomes worse when the cluster size is over hunderds or throusands. To address this impeding, this paper has proposed a method to automatically deploy an any size of Hadoop cluster on AWS, including installation and configuration. This saved time which can be spent on more important work.
Keywords/Search Tags:Hadoop, AWS, Deploy
Related items