Performance evaluation of big data placement structures in MapReduce-based data warehouse systems

Posted on:2017-02-12

Degree:M.S

Type:Thesis

University:Lamar University - Beaumont

Candidate:Hasan, Mohammad Rakibul

Full Text:PDF

GTID:2468390014473096

Subject:Computer Science

Abstract/Summary:

The size of data sets is growing rapidly, which requires fundamentally innovative techniques and technology to capture, store, distribute, and process promptly and cost effectively. Hadoop software framework with high-performance execution engines (MapReduce) is capable of processing large data sets across clusters that provide scalable and fault-tolerant capability on distributed systems. MapReduce-based warehouse system with data storage format is very useful for data summarization and query analysis. The warehouse system can contain millions of row column value and therefore, data placement structure plays a significant role that can influence the warehouse performance. In this research, we examined the performances of Hive's data file formats, the RCFile and ORCFile on top of Hadoop. For this experiment, we design and implement a distributed cluster by three nodes master-slave architecture, where we store and organize the data according to the above files' format structure. We investigate the file format efficiency in terms of data loading, data storage and query processing using MapReduce. The experimental results can lead to choosing the perfect and useful file format for a data warehouse system for Big Data processing.

Keywords/Search Tags:

Warehouse system, Big data, Format, Data placement, Data sets

Related items

1	Data Warehouse And Data Mining In The Securities Brokerage Business Crm Applications
2	Research And Implementation Of Hospital Data Warehouse
3	The Research And Application Of Data Preprocessing In XML Data Warehouse
4	Based On Data Warehouse Decision Support System Research And Implementation
5	The Data Integration、analysis And Utilization For Hosiptal Information Based On The Data Warehouse
6	Study The Data Analysis & Process System Of Bridge Healthy Monitoring Based On Data Warehouse
7	Research On CRM System Based On Data Warehouse
8	The Reaserch And Implementation Of Data Quality Control Model Of Data Warehouse System In Tele-Communication Industry Based On ETL Tool
9	Application Of Data Warehouse And Data Mining Technology In Tax Administration System
10	Multi-Copy Data Placement For Distributed Data Centers