HERDER: A Heterogeneous Engine for Running Data-Intensive Experiments & Reports

Posted on:2012-07-08

Degree:M.S

Type:Thesis

University:University of California, Irvine

Candidate:Ayyalasomayajula, Vandana

Full Text:PDF

GTID:2458390008993240

Subject:Computer Science

Abstract/Summary:

There has been a tremendous increase in the amount of data collected everyday by Internet companies like Google, Yahoo!, Facebook and Twitter. These companies use large Hadoop clusters with thousands of machines to analyze the collected data. The usage model for data-intensive computing platforms like Hadoop is challenging, as many users can be submitting jobs to a cluster at the same time. Therefore, there is need to understand user behavior, cluster resource usage, and how data-intensive computing platforms react to multi-user workloads.;In this thesis we describe HERDER, a multi-user benchmarking tool to execute synthetic workloads on a cluster. HERDER provides a flexible user model to simulate an actual environment with users working on a cluster and an extensible framework to execute jobs belonging to a variety of data-intensive computing platforms. The HERDER workload generator reports statistics for the steady-state and the overall execution time periods at the end of workload execution.

Keywords/Search Tags:

HERDER, Data-intensive

Related items

1	Optimize The Data-intensive Oriented Application Of Web Services Composition
2	Research On Key Technologies Of Data Management For Data-Intensive Applications
3	Design And Development Of Data-intensive Computing Oriented Ship Emergency Response System
4	Research Of Data Classification Algorithms In Data-intensive Computing Environments
5	Reseach On Data Placement Strategy For Data-intensive Applications In Cloud
6	Parallel Optimization Of Data Intensive Computing On Sunway TaihuLight
7	Design Of Energy-efficient Reconfigurable System Architectures For Data-intensive Computing
8	Integrating data and compute intensive workflows for uncertainty quantification in large scale simulation - Application to model based hazard analysis
9	Job Scheduling Technologies In Data Intensive Supercomputing Systems
10	Research On Wide-area Data-intensive Computing Systems For Spatial Data Processing