Font Size: a A A

Design And Implementation Of Data Integration Platform Based On ETL

Posted on:2020-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:M J WangFull Text:PDF
GTID:2428330602950535Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent decades,with the rapid development of science and technology and information,the amount of data accumulated in human study,work and life is very large,and the demand for data collection,storage and dissemination is also increasing.However,when data are acquired,there are huge differences in the format,content and quality of these data.There is great difficulty in data flow and sharing.In order to enable more people to get data more conveniently,reducing the time,cost and physical cost of data collection,it is particularly important to achieve data sharing.In this paper,a data integration platform based on ETL is designed to extract,transform and load data according to the needs of users in order to achieve data sharing by designing ETL process.Traditional data integration systems usually have disadvantages such as difficult to use,high operation threshold,no workflow characteristics,no flexible scheduling,deployment,monitoring and other management functions.The data integration system proposed in this paper aims to solve the main problems of traditional data integration system.First of all,aiming at the problem of high usability and high operation threshold,this system adopts the TWaver framework to design the ETL process quickly and flexibly by dragging with the graphical tool of the flow chart.Moreover,aiming at the workflow problem,the system realizes the workflow characteristics of the ETL process by designing tasks and jobs.The task includes a complete ETL process,and the job is composed of one or more tasks,and the order of task execution is controlled by executing jobs.Then,in view of the scheduling problem of ETL process,this paper focuses on scheduling tasks using Quartz scheduling framework.Because Quartz can only perform single task scheduling,it can not perform task chaining scheduling,namely job scheduling.This system uses task listeners to solve this problem.At the same time,in order to satisfy users' complex scheduling needs for jobs,the system uses Quartz to perform real-time operations.Cycle scheduling with timing and time intervals.Finally,aiming at the management problem,the system manages the task information,operation information,operation information,user operation information,data source information,actuator information,user information,role information and permission information through six modules: task management,job management,monitoring and management,audit information,resource allocation and system management.The data integration platform based on ETL is an application system that takes the design,deployment,scheduling,monitoring and management of ETL as the core function.Users can quickly and flexibly design the ETL process with the help of the system,and can easily manage,deploy,monitor and manage other activities.The system provides a simple way to design the ETL process,support the complex scheduling of ETL process,provide fast and intuitive monitoring for the scheduling process,and have great significance for the integrated management of data integration.
Keywords/Search Tags:Data Sharing, Data Integration, ETL, TWaver, Quartz
PDF Full Text Request
Related items