Font Size: a A A

Design And Implementation Of ETL Language Compiler For Hadoop Platform

Posted on:2013-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhouFull Text:PDF
GTID:2208330434970259Subject:Computer science and technology
Abstract/Summary:PDF Full Text Request
Data warehouse is a data platform which provides information for enterprises, developed originally for improving traditional database system’s lacking in enterprise information management and decision making. ETL (Extract, Transform, and Load) technology is a very important part in data warehouse technologies. It’s used to extract, purify and transform data collected from different data sources. And the normalized data are then put into data warehouses.With the popularity of the Internet.there’re more and more data stored in enterprise and the traditional data warehouses can hardly fill the growing needs of data storage, analysis and mining. The Hadoop technology comes out with the ability of storing large amount of data and performs efficient analysis on them. Hadoop platform is also cheap to build and scalable. So it’s worthwhile to investigate how to seamlessly integrate Hadoop platform into enterprise level data processing,In this paper, we implement a widely used ETL programming language Ab Initio DML compiler using LL(*) based ANTLR. The compiler is targeted to Hadoop platform. The underlying principle is that first a Python program that utilizes Hadoop streaming interface is generated by the compiler. It solves the problem of numerous legacy ETL programs. And also Python is easy to understand and maintain.In this article, we mainly did two kinds of research:1. A new method of integrating Hadoop to existing ETL procedure is proposed. It greatly improved the data amount and data processing speed for current ETL procedural.2. Implemented a compiler that compiles a Domain Specific Language to general high level programming language. It provides a base framework for the following similar works.
Keywords/Search Tags:Data warehouse, ETL, Hadoop, Compiler, ANTLR
PDF Full Text Request
Related items