Font Size: a A A

Research And Implementation Of Optimal Scheduling Algorithm In Data Stream Processing System

Posted on:2018-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:T T FuFull Text:PDF
GTID:2348330512483003Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,more and more scenes are using real-time data processing such as online advertising and real-time statistical analysis.Data stream processing technology has become a new research hotspot because of its timeliness.Streaming technology provides real-time,reliable,easy-to-use and scalable computing services through which data can be calculated before the deadline.Data stream processing system is facing new challenges in load forecasting and topology scheduling because of real-time requirementsFor the problem of load forecasting and task scheduling,there have been a lot research and achievement.However,most of the research is not against the characteristics of data stream and stream processing system.Load forecasting based on linear regression algorithm can not reflect the load condition in the data stream processing system well.For task scheduling,the scheduling algorithm for single resource assessment can not meet the real-time and task-complete scheduling requirements in the streaming processing system.Based on the existing research results,this paper proposes a load forecasting algorithm based on SOM clustering and a stream processing scheduling optimization algorithm according to the characteristics of data stream processing system.First of all,the load forecasting algorithm which is based on SOM clustering can do predicting according to the load information of the known category.The load of the similar computing model is clustered by simulating the influence of the data source and the computational topology on the load of the stream processing by using the SOM artificial neural network.Weight vector initialization strategy,SOM hit prediction mechanism and SOM state machine are proposed to improve the real-time performance of the algorithm.The experimental results show that the algorithm proposed in this paper can achieve better results in stream processing load forecasting.Secondly,scheduling optimization algorithm analyzes the particularity of topology scheduling problem in stream processing scheduling,and divides scheduling problem into task selection and node selection.In the initial stage,CPU,memory and communication cost are taken into account in order to select the optimal node.In the dynamic adjustment stage,the scheduling adjustment is carried out according to the communication situation in actual operation.The experimental results show that the proposed stream scheduling algorithm has a good effect on load balancing and reducing communication delay.Finally,this thesis designed a Storm-based stream processing platform,introduced the key modules such as scheduling subsystem and subscription publishing subsystem in detail.At the same time,the proposed load forecasting algorithm based on SOM clustering and stream processing scheduling optimization algorithm are embedded in the system module.The experimental results show that the algorithms proposed in this paper have validity and feasibility in the stream processing system.
Keywords/Search Tags:data stream, load forecasting, SOM algorithm, topology scheduling
PDF Full Text Request
Related items