An Efficient Foundation for Big Data Processing on Modern Clusters

Posted on:2017-05-22

Degree:Ph.D

Type:Thesis

University:University of California, Irvine

Candidate:Borkar, Vinayak

Full Text:PDF

GTID:2468390011495436

Subject:Computer Science

Abstract/Summary:

In recent years, the world has seen an explosion in the amount of data being generated. Google proposed the MapReduce framework to allow programmers easily process massive amounts of data in parallel using a cluster of shared-nothing commodity machines. What started out as a tool for human efficiency subsequently began to be used as an intermediate representation for queries compiled from higher-level declarative languages. In this thesis, we present an alternate software stack for building scalable Big Data systems. We specifically focus on two parts of the stack. Hyracks is a new partitioned-parallel runtime layer that provides an efficient, generalized model for executing data-processing jobs on a cluster of commodity machines. Algebricks is a compiler framework that helps to build high-level declarative language compilers for parallel processing on top of Hyracks.

Keywords/Search Tags:

Data

Related items

1	Seismic Achievement Data ETL Platform Architecture Design And Software System Implementation
2	The Research And Application Of Data Preprocessing In XML Data Warehouse
3	Research On Related Issues Of Unstructured Data
4	The Data Integration、analysis And Utilization For Hosiptal Information Based On The Data Warehouse
5	Design And Implementation Of Data Mining Support Subsystem Based On Big Data Of Power
6	Design And Implementation Of Environmental Monitoring Data Management System
7	Research On The Problems And Countermeasures Of Domestic Data Journalism Practice
8	Study On Data Dependency_Based Data Quality Processing Techniques In Data Integration
9	Big Data And Research Of Big Data In Modern Internet Applications
10	Design And Implementation Of The Bayonet Data Integration Platform