Font Size: a A A

Research&Development Of Distributed Stream Real-time Computing Framework

Posted on:2014-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:W GuFull Text:PDF
GTID:2248330398995269Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the development of the data processing technology, the applications based onthe data analysis are concerned greatly by the public. And data structure also shows the trend ofdiversification, these data are not only the traditional non real-time, static structured data, butalso many real-time, dynamic produce unstructured data flow. This kind of unstructured datasequence arrive serially; their velocity, flow and direction are in constant change and difficult topredict. In the face of the huge varying mass data streams, it is hard to capture the informationcarried by data flow and complex calculation in a real-time distributed processing using atraditional pattern.This has prompted us a further research on new management of distributedstreaming computing.At present, it is in the beginning stage on system of distributed streaming computingresearch at home and abroad and lacking mature output. Therefore, the author based on hisaceumulated research in distributed proeessing designs and realizes the entire flow calculationmanagent framework iStream in case of thorough analysis on flow data processing applicationrequirement and alse make a deep research and optimization on load balancing algorithm. It isproved that the framework can customize nimbly according to the actual application scene andhas the good performance that meets expectations through simulation and the performance test.The innovation points and achievement of this paper are as follows:(1) Several key techniques in the framework of distributed computing are researched. Thedistributed system which the author designs in this paper is not only for specific applicationscenarios and solving particular problems. It is not universal and expansibility for single scenesystem decided by various forms of data stream and application scenarios. It has a strongversatility and expansibility, remarkably raised the third party developers developmentefficiency.(2) In order to improve throughput rate and data processing performance, and raise theflexibility and availability of cluster, the dynamic scheduling technology and load balancingalgorithm have been further in-depth studied. This paper puts forward using time series prediction algorithm to solve the NP-complete problem in the parallel computation taskscheduling and processing non-stationary data sequence through improving modeling AR modelevaluation algorithm. It makes prediction of the flow data which can’t use simple piecewisemodel to represent data sources more accurate and efficient, and at the same time ensure theperformance of the dynamic load balancing algorithm.(3) Innovation of frame design and implementation. After research on mainstreamprogramming model applying in parallel computing, such as MapReduce, we use improved SPSmodel in iStream framework. Throught analyses and compares of the various mainstreamdistributed process communication mode, we solve the key problems in distributed system: thehigh concurrent real-time processing, the security of the distributed system data communicationand adaptive adjustment. And combined with the characteristics of the flow calculation, theimprovement of traditional distributed computing strategies in the framework’s each moduleenhance the security and significantly reduce the delay rate.(4) After in-depth analysis of the distributed real-time computation framework suitablescene, we use CTR Effect Advertising System and OPO System as case studies to test theperformance in the practical application. Finally, the thesis is summarized and predicted.
Keywords/Search Tags:Distributed stream computing, Task scheduling, Dynamic load-balance, Time seriesprediction algorithm
PDF Full Text Request
Related items