Font Size: a A A

HERDER: A Heterogeneous Engine for Running Data-Intensive Experiments & Reports

Posted on:2012-07-08Degree:M.SType:Thesis
University:University of California, IrvineCandidate:Ayyalasomayajula, VandanaFull Text:PDF
GTID:2458390008993240Subject:Computer Science
Abstract/Summary:
There has been a tremendous increase in the amount of data collected everyday by Internet companies like Google, Yahoo!, Facebook and Twitter. These companies use large Hadoop clusters with thousands of machines to analyze the collected data. The usage model for data-intensive computing platforms like Hadoop is challenging, as many users can be submitting jobs to a cluster at the same time. Therefore, there is need to understand user behavior, cluster resource usage, and how data-intensive computing platforms react to multi-user workloads.;In this thesis we describe HERDER, a multi-user benchmarking tool to execute synthetic workloads on a cluster. HERDER provides a flexible user model to simulate an actual environment with users working on a cluster and an extensible framework to execute jobs belonging to a variety of data-intensive computing platforms. The HERDER workload generator reports statistics for the steady-state and the overall execution time periods at the end of workload execution.
Keywords/Search Tags:HERDER, Data-intensive
Related items