Font Size: a A A

Research And Implementation Of Data Integration Platform Based On Kettle And Quartz

Posted on:2020-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:J D CuiFull Text:PDF
GTID:2428330575451578Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous advancement of the "Internet accelerated speed" era,enterprises and institutions are also constantly developing their own information construction.Due to their different business needs,enterprises and enterprises have independently developed and designed application systems that conform to their own departments.The respective data storage and access methods eventually lead to the phenomenon of “data islands ” and there is a large amount of redundant data.Therefore,how to effectively integrate data within the enterprise has become an important issue.Data integration can effectively solve the problem of data redundancy and data utilization degradation,and achieve data sharing.At the same time,data integration is also an important part of enterprise data warehouse,data mining,and upper-level decision analysis.At present,although the methods and technologies of data integration effectively solve the phenomenon of “data islands”,the complicated programming and unscheduled operation modes in the implementation process make the fault of constructing data integration extremely high and the development efficiency is low,which increases The development cost of an enterprise requires the design and development of a data integration platform to solve the problems in existing applications.Based on the open source Kettle and Quartz,this paper designs and implements a data integration platform that supports the whole process and support for real-time scheduling.The platform uses AngularJs,SpringMVC,and Mybatis framework to build a Web platform,using Twaver as a graphical designer,Quartz as a real-time scheduler,Kettle as a data integration engine,and Postgresql for data persistence storage.This thesis analyzes the shortcomings of Kettle,then makes optimization improvements for Kettle,proposes the concept of actuators and adds real-time scheduling capabilities.This paper mainly introduces the design and implementation of the actuator.The actuator is the actual carrier of the job operation,which is used to schedule and run the jobs sent by the server.The registration and user isolationfunctions of the actuator are realized.The heartbeat mode is also established to establish the communication mode between the actuator and the server,and the method of command backup is used to ensure the reliability of the heartbeat communication,which reduces the downtime to some extent.The catastrophic impact;for some real-time scheduled jobs,the localization function is accelerated to speed up the query of job history information.The Quartz scheduling method has been improved to support tree scheduling.The implementation of the data integration platform solves the problems of complex operation,no mature scheduling,operation monitoring and inefficient operation in Kettle,realizes unified scheduling,monitoring and management of ETL operations,reduces application development time,and improves application development.Efficiency,which reduces the cost of enterprise development and operation and maintenance.
Keywords/Search Tags:Data Integration, Data Warehouse, ETL, Data sharing, Data mining
PDF Full Text Request
Related items