Font Size: a A A

Research On Resource Scheduling Method Based On Flink Framework Of Computing On Data Stream

Posted on:2021-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:B Q WeiFull Text:PDF
GTID:2428330605955979Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of big data technology,the financial banking industry,the Internet and the Internet of things and other industries closely related to people's lives have undergone tremendous changes.The scale of data has expanded rapidly,and data computing has scale and real-time.In the real-time computing scenario,Flink computing on data stream framework provides rich operator support and better fault-tolerant mechanism,and makes many optimization in resource scheduling,which can process large amount of data in real time.However,when faced with the sudden increase of real-time data stream,Flink can not adjust the dynamic resources according to the current data stream,which will lead to the bottleneck of the calculation and can not guarantee the real-time performance of the calculation results.In this paper,a resource scheduling management system based on Flink framework is designed to solve the problem of performance bottleneck when the amount of data increases abruptly.It can monitor the operation of jobs in real time,find out the performance bottleneck in time and adjust the resources to ensure the real-time performance of data calculation.The system mainly includes Flink job monitoring subsystem and operator resource scheduling optimization subsystem.Flink job monitoring subsystem is mainly used to track the data input and output of each operator on the job in real time,the use of network cache during the operation of the job,and record the topological relationship of the operator,so as to provide a direct basis for judging the calculation bottleneck of the operator.According to the collected monitoring data and the directed acyclic topology of the job operators,the operator resource scheduling optimization subsystem searches the performance bottleneck operators according to the backpressure mechanism of Flink job,and optimizes the performance bottleneck operators: 1)for Flink On the Source operator caused by the simultaneous consumption of multiple data sources(such as Kafka message queue),the calculation bottleneck problem caused by the uneven data partition is optimized to ensure the uniform distribution of data sources(such as Kafka partition)in the Source operator and reduce the calculation bottleneck problem caused by the uneven distribution;2)the performance bottleneck problem caused by the insufficient calculation ability of the non Source operator is addressed Resource scheduling is carried out,and operator parallelism is adjusted according to the processing ability reflected in the monitoring system to solve the performance bottleneck;3)In view of the performance bottleneck caused by the data skew of non Source operators in the calculation process,a set of pre aggregation optimization strategy is designed to split the data aggregation operators and pre aggregate the data to reduce the data skew caused by the aggregation calculation,and the corresponding operator resource scheduling strategy is designed for the data skew of operators in the multi key scenario,and the impact of data skew on computing performance is reduced.Finally,this paper builds a set of Flink job operation platform and Flink resource scheduling environment,and carries out resource scheduling adjustment experiments for Flink performance bottleneck problems of various reasons.Through a series of experimental data,it proves that the scheduling of resource method based on Flink framework designed in this paper has good effect in solving the operator bottleneck problem,and can effectively increase the throughput of the system and ensure the real-time performance of the system.
Keywords/Search Tags:Big data, Computing on Data Stream, Flink, Scheduling of resource
PDF Full Text Request
Related items