Font Size: a A A

Design And Implementation Of ETL Management System Based On Kettle Cluster

Posted on:2019-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2428330545953206Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Information is an important resource in today's society and the basis for scientific management and decision-making by all walks of life.The data volume of information is increasing exponentially each year.This information data contains huge value.However,the current utilization of information resources in various industries is very low,only about 2%to 4%.Wasting a lot of time and manpower and resources.Therefore,how to increase the utilization of information resources has become a topic of common concern for all walks of life.Data warehouse came into being in this context.ETL is an important step in building a data warehouse.It integrates data in accordance with unified rules,thereby increasing the value of data.During the construction of the data warehouse,a large number of ETL jobs are required.How to effectively manage these ETL operations is the key to improving ETL efficiency.This article is based on Kettle developed a new ETL management system,which greatly improves the efficiency of ETL,thus improving the utilization of information resources.The tools currently used in ETL technology include:OWB,ODI,Informatica,Kettle,and others.Among them,Kettle is an open source ETL tool,written in Java,which can run on multiple systems and data extraction is more efficient.Therefore,Kettle was chosen as the ETL tool.However,Kettle has a big bottleneck in performance.At the same time,as an open source tool,it has a lot of bugs in it,and there is no friendly Chinese version.On the other hand,most current Kettle jobs are scattered and stored in various machines in a file state,which is not conducive to unified management and monitoring,and the majority of jobs are performed on a stand-alone machine with low efficiency.Therefore,it is necessary to create an ETL job management system.Through the development and use of the ETL job management system,the ETL process is handled clearly and efficiently through job management scheduling.The ETL management system is based on Kettle cluster development.Similar to many distributed systems,Kettle clusters run on multiple servers and can effectively prevent single points of failure.The clusters are fast and are suitable for large data operations.Kettle clusters,however,suffer from high network bandwidth requirements.In order to optimize the overall performance,the system rewrites some of Kettle's original features so that the Kettle cluster is only responsible for the execution of jobs and transformations,thereby minimizing the burden on the cluster.At the same time,the system performs unified management for a large number of ETL operations,transfers ETL jobs to the resource repository and completes classification management.Combining Kettle cluster and its load information,real-time and scheduled execution strategies are adopted to implement job scheduling and job execution process monitoring.And the system has introduced the function of the job flow design,according to the specific business scenario,to provide a visual process designer to draw the flow chart,to implement the ETL operation according to the process execution.In addition,the system also provides ETL job management and monitoring,machine resource monitoring,and one-click construction of the server environment.The system solves the problems of Kettle's cumbersome operation,low efficiency,instability,and poor user experience.The system page is rich in content,intuitive and clear,and user-friendly.The unified management and monitoring of ETL operations are realized,so that a large number of operations can be performed on the cluster,which greatly improves the execution efficiency,reduces the ETL failure rate,and increases the utilization of information resources.
Keywords/Search Tags:Data Warehouse, ETL management, Resource utilization, kettle
PDF Full Text Request
Related items