Font Size: a A A

Design And Implementation Of Stream Computing Platform Based On Flink

Posted on:2021-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z WangFull Text:PDF
GTID:2428330602476678Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of distributed technology and the widespread use of various distributed computing frameworks in many fields,the data processing model has begun to shift from offline data processing to real-time data processing,so that decisions can be made quickly based on data changes.Therefore,the real-time computing system has gradually become an indispensable tool for real-time data processing of enterprises and has an irreplaceable effect on the future development of enterprises.The traditional real-time data processing mode is to use a program call interface provided in the real-time computing framework to write a program for a specific business scenario and submit the program to an existing cluster resource by actively uploading a program package.There are many disadvantages to this data processing model.First,writing real-time computing programs requires high skills for developers.Developers need to have certain distributed computing-related development experience.Obviously,there are very few skilled personnel reserves in many traditional enterprises.Second,the deployment and operation monitoring of the program is very inconvenient.Developers need to actively upload and start tasks,and track the execution of the program through cluster commands.Therefore,this paper proposes to use the Flink real-time computing framework to build a real-time computing platform,thereby simplifying the development of complex real-time computing tasks and tedious task deployment and monitoring.This article focuses on the following four aspects:First,expand Flink's real-time computing framework and use it as the underlying engine for real-time data processing.Flink provides SQL syntax to perform DML(data manipulation language)operations on real-time data,but it does not support DDL(data definition language)operations and functions directly associated with external dimension table data sources.Therefore,the underlying engine built in this article based on the Flink framework supports creating tables and creating view statements.At the same time,the asynchronous syntax provided by Flink is used to implement the SQL syntax of the association between real-time data and external data source dimension tables.Second,an upper-layer distributed scheduling system is set up for deploying real-time computing tasks to submit tasks and track the execution status of tasks.The scheduling system is deployed on multiple nodes and supports horizontal expansion.The submitted tasks will be evenly distributed to the child nodes for management.Tasks will be submitted when the cluster resources are sufficient.During the task execution process,the task execution status will be tracked,The operation log link is returned to the upper interactive system.Third,set up a visual page for the creation of real-time computing tasks and task monitoring to simplify the user operation process.Real-time calculation tasks configure data sources through the page,which is convenient for users to configure data source information for tasks.At the same time,the visualization page can display the status information of the task scheduling,real-time display of internal monitoring indicators and operation log information when the task is executed.Fourthly,the real-time computing platform designed in this paper is implemented systematically,and the main functional modules are displayed in screen shots.At the same time,the software and hardware configurations used in the system building process are explained,and the main table structures used are described in terms of field attributes.
Keywords/Search Tags:Real-time computing platform, Flink, task scheduling, distributed computing
PDF Full Text Request
Related items