Font Size: a A A

Design And Implementation Of A Streaming SQL Real-time Computing Platform Based On Apache Flink

Posted on:2021-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:G CaoFull Text:PDF
GTID:2518306575953939Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The traditional offline data storage and computing scheme based on Hadoop ecology has been widely used in the industry.However,due to the high time delay of offline computing,more and more data application scenarios have changed from offline to real-time.As a distributed big data processing engine with rapid development in recent years,Apache Flink has become the preferred computing engine for real-time computing platforms of various companies due to its obvious advantages in real-time(streaming)computing and relatively complete SQL support.However,most of the traditional real-time computing platforms submit tasks by packaging,which results in higher learning cost,lower development efficiency and higher maintenance cost.Therefore,it is urgent to provide a real-time computing platform for SQL programming.The streaming SQL real-time computing platform based on Apache Flink mainly includes task management,data source management and SQL Engine modules.The task management module is the back-end of the whole platform,which is responsible for the management,submission and monitoring of tasks.The management part includes the development status and version management of SQL tasks;the task submission part is assigned to multiple launchers deployed in different clusters to perform specific operations through Flink client;the monitoring part includes job monitoring and launcher monitoring to monitor the status of tasks in real time and update in time,monitor the status of the launcher in real time,so that the corresponding tasks can be handled correctly when the launcher is down.The data source management module is responsible for the registration and update of different data source information and the management of user-defined functions.The SQL engine module is responsible for parsing the user's SQL tasks one by one according to the type of Flink SQL and sending them to the Flink engine for execution.The connection between the data source information in SQL and the data source management module is realized through the catalog interface provided by Flink,so that the user's SQL task does not need to write DDL(data definition language),but only needs to write DML(data manipulation language).This platform uses Spring Boot development framework,uses My SQL to achieve data persistence,and ensures high service availability and task reliability through Zoo Keeper.Through unified data source management,the real-time computing platform completes the development of real-time computing tasks only by writing SQL DML,which reduces the development cost and improves the development efficiency.Through the test of common data sources,it can complete the real-time computing task well and basically achieve the expected goal.
Keywords/Search Tags:Flink, real-time computing, stream computing, streaming SQL
PDF Full Text Request
Related items