Font Size: a A A

Design And Implementation Of A Real-time ETL Tool

Posted on:2022-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:M L QuanFull Text:PDF
GTID:2518306602965609Subject:Master of Engineering
Abstract/Summary:
In recent years,with the Internet of things,cloud computing,and big data technologies are widely used in enterprise systems.In order to realize the value of enterprise data,more and more enterprises are beginning to integrate internal system data,build data warehouse using ETL(Extract-Transform-Load)tools.Meanwhile,the need of the real-time data are raising higher demands of the real-time data processing.Some enterprises find that the existing ETL tool can not support the fast and effective establishment or process display of the warehouse because of their mufti-C/S architecture,inconvenient management,unable cluster deployment and low real-time data processing.Thus the existing ETL tool is a strong factor of hindering the construction of big data platform.Aiming at the above-mentioned problems,this paper designs and implements a real-time ETL tool which is guided by actual demands of a certain enterprise for real-time data ETL process.The bottom layer adopts the latest real-time data processing framework Flink,and the message-oriented middleware Kafka is used to realize distributed execution ETL process.On this basis,multi-user management mechanism is introduced.And the real-time ETL task,monitoring and management are created quickly,as well as the real-time ETL process statistics and reporting functions are realized.The main work of the paper is as follows:(1)System demands analysis.First,according to the specific use environment of an enterprise,it analyzed the overall demand of real-time ETL tools.Then,on the basis of the simple module division of the system,it introduces the detailed requirements of the system management module,the data access module,the data processing module,the data connection module and the system monitoring module respectively.It also introduces both the detail of the interaction process between administrator users,ordinary users and platforms in the system and the specific requirements of ETL task creation and execution.At the same time,the non-functional requirements of the system are described according to the actual needs.(2)System design and implementation.Combined with the actual requirements of the system,it designs the overall hierarchical architecture of real-time ETL tools.First,in order to realize the system database table,the system entity attributes are mapped to the ER diagram.It realizes the mapping between platform system and My SQL table data using My Batis-plus.And it realizes user authentication and authorization by Shiro which can ensure multiple authorized users will be created in the system.Secondly,the function of ETL task execution layer is realized based on Flink and Kafka.Meanwhile,it is divided into three logical independent subsystems: data access,data processing and data connection,which ensure the characteristics and data integrity of each module of the system can be extended independently.Finally,it designs the system monitoring module to ensure the monitoring and expansion of the subsystem.In the whole design and implementation process,it uses the class diagram is used to standardize the whole coding framework,and the timing diagram and flow chart to complete the agreement on the data flow.(3)System testing.After the coding of the system,it builds the test environment and deploys the system.It writes test cases for each function of the system and completes functional tests by the test steps.At the same time,after the performance testing of data integrity,task concurrency and system running stability,it analyzes the results.After testing,the functions of the system can work normally.The user interface is used normally.The business function is accurate.The ETL task execution is normal.The data execution process is not lost.The system has disaster recovery ability.The task concurrency degree meets demands and the system can run stably for a long time.In conclusion,the realtime ETL tool this paper designs can meet the enterprise’s demands.
Keywords/Search Tags:ETL, Real-Time Data, Flink, Kafka, Data Base
Related items