Font Size: a A A

Research And Implementation Of Easy Wrangling Data Transformation Script Execution Engine

Posted on:2019-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:J R WeiFull Text:PDF
GTID:2428330548477421Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The self-service data preparation technology provides a GUI-based interactive data transformation processing tool,which is able to infer the intent of user and generate data transformation without programming according to the interaction of user in the graphical interface.Big data oriented self-service data preparation technology demands the ability to process large amounts of data.It can convert user interactions to the logic of processing large-scale data.It needs to ensure the scalability and efficiency to achieve efficient and scalable data processing.EasyWrangling is a big data oriented self-service data preparation tool.It includes two parts:a graphical interface program and a data transformation script execution engine.This paper focuses on the data transformation script execution engine.The excution engine parses and optimizes the script to process the massive data stored on the Hadoop platform.The work include:1.Defined data model and data operators.Designed a declarative data trans-formation language called Wrangling DSL to describe the data operations generated by user's interaction in the graphical interface.Designed and im-plemented data operators based on MapReduce model.2.Designed and implemented a data transformation script execution engine based on Wrangling DSL.3.Proposed optimization methods for execution of single data transformation script and excution of multiple data transformation scripts.Experiment results showed the execution engine is scalable and the optimization methods are feasibile and effective.
Keywords/Search Tags:Data Preparation, Data Wrangling, Data Transformation
PDF Full Text Request
Related items