Font Size: a A A

Design And Implementation Of A Distributed Stream Computing Platform

Posted on:2021-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2428330620464190Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With development of sience and technology,e-commerce,social networking,news aggregation,and video surveillance and satellite remote sensing technologies are booming.Massive real-time data is generated every day.The data is not the same as traditional data in form or scale.Traditional data is generally structured static data with a small scale and low real-time processing requirements.Most of the real-time data is streaming data,which has the characteristics of huge data scale,uncertain data flow direction and velocity,real-time nature,and unstructured characteristics.Therefore,a distributed computing platform,which is specialized in processing streaming data,has emerged as the times require.This thesis proposes a distributed stream computing platform that provides real-time stream analysis services with low latency and high throughput.Specify application logic in the topology diagram.This thesis will focus on task scheduling algorithms,fault tolerance mechanisms,and message processing mechanisms in distributed stream computing platforms.The main tasks include the following:1)Study the current common streaming computing platforms(Storm,Spark Streaming,Flink,MillWhell,etc.),mainly study its scheduling algorithm,fault tolerance mechanism and message mechanism.Analyze the advantages and disadvantages of each platform.2)In the aspect of task scheduling,an intelligent scheduling algorithm based on Qos constraints is used.During task scheduling,the scheduling algorithm module calculates the node resource usage rate and uses the resource usage rate as a constraint for scheduling.Different resource uses have different weighting factors,and the annealing algorithm is used to train the weighting factors,so that in different operating environments,Scheduling algorithms intelligently schedule tasks to increase system throughput,increase data processing capabilities,and improve system performance.3)The fault tolerance mechanism mainly guarantees system reliability.Due to the distributed system,the fault is always normal,hardware failures such as motherboard power supply and software failures such as process crashes will affect system reliability.This thesis uses replication fault tolerance technology and Zookeeper open source components to save node state information and improve the robustness of the system.4)The message processing mechanism mainly guarantees that each message will be processed.This thesis uses the message tracking mechanism to ensure that each message will be processed.The cache mechanism is introduced to ensure that messages will not be processed repeatedly when message processing errors occur,which can improve the system's operating efficiency.It can also improve the stability of the system.Through the function and performance testing of the convection computing platform,the scheduling algorithm improves the system's throughput and reduces the delay.The fault tolerance mechanism ensures the reliability and robustness of the system.The message processing mechanism ensures that messages are not lost.Summary and outlook for future work.
Keywords/Search Tags:Distributed Streaming Computing platform, Scheduling algorithm, Fault tolerance, Message processing
PDF Full Text Request
Related items