Font Size: a A A

Research And Application Of Etl Framework Based On Mapreduce And Programming Mode

Posted on:2014-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:L S YinFull Text:PDF
GTID:2248330395980922Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with rapid development of the computer and the Internet technology, many enterprises are gradually realizing the computerized management of business, and have accumulated a large amount of historical business data. The data volume is growing rapidly. More and more enterprises are using data warehouse technology to summary, process and analyze data. ETL process is the key link for building data warehouse. If the ETL can help users neatly implements the ETL solution, the data warehouse solution can be implemented smoothly. And the ETL processing efficiency directly affects the data loading efficiency. Aiming at this issue background, the paper’s main research work is as follows:(1) Analyze the current research situation of MapReduce and ETL. State the research significance of this paper. Analyze and research the dimension data ETL, fact data ETL, ETL processing model, the slowly changing dimension and MapReduce programming model separately.(2) Design a general ETL framework based on MapReduce and programming mode. At first, design the overall structure of the ETL. Secondly, design the dimension data ETL module and fact data ETL module in detail. Finally, design the dimension data indexing and optimize the algorithm of data block allocation in MapReduce.(3) Test and analyze the usability and performance of the ETL tool designed by this paper in the big data processing aspects.And compare it with the ETL tool provided by Hive.The ETL general framework designed by this paper can assist ETL programmers to implement ETL solutions agilely and complete ETL distributed processing program with the high efficiency. The results show that, this ETL programming framework is not only easy to use, but also the ETL tool realized based on this ETL framework is less70%than Hive in data processing time.
Keywords/Search Tags:MapReduce, Programming Mode, ETL, Slowly Changing Dimension, DimensionData, Fact Data
PDF Full Text Request
Related items