Font Size: a A A

Research On Performance-effective Load Balancing Technology In Distributed Stream Processing Systems

Posted on:2021-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ChenFull Text:PDF
GTID:2518306107478784Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of big data technology,more and more attention has been paid to efficient and scalable data stream processing technology.Distributed Stream Processing Systems(DSPSs),as a typical data Stream Processing architecture,has been developed greatly in recent years.Key-based operation is a common operation in data stream processing technology.When data streams are grouped and processed in parallel according to key,the load distribution among parallel processing nodes is unbalanced due to the inclined distribution of data and the dynamic,uncertain and continuous arrival characteristics of data,thus affecting the performance and reducing the throughput of the system.From the perspective of analyzing the impact of data skew on the performance of distributed data stream processing system,this thesis proposes a set of balanced adjustment scheme based on key operation in DSPSs,the scheme considers the heterogeneity brought by cluster expansion and the performance of parallel nodes.The main contributions are summarized as follows:(1)Aiming at the balanced adjustment scheme based on key operation in a distributed environment,this thesis establishes a performance-aware load balancing framework.The framework introduces performance-aware technology and hybrid routing technology,while designing a data partitioning strategy that matches the performance of the nodes.The application of this framework in this thesis shows that the framework can be better applied to distributed clusters with heterogeneous characteristics.(2)In view of the scene with low data skew,this thesis proposes a load balancing adjustment method with key as granularity.When the data skew is low,the load of a single key will not be greater than the bearing capacity of a single parallel processing node.Therefore,using key as granularity to realize balance adjustment can theoretically adjust the load between parallel nodes to a balanced state.According to the principle of minimum extra cost,we optimize the equilibrium adjustment process by analyzing the extra cost in the process of equilibrium adjustment,and puts forward the PSLC algorithm.The experimental results of real data show that: compared with KG,PKG,Readj,Mixed algorithm,PSLC algorithm system throughput is higher,processing delay is lower.(3)Aiming at the scenes with high data skew,a load balancing adjustment method PSCG based on hybrid splitting strategy is proposed.This method solves the problem that the PSLC algorithm can not be applied to the application scenario with high data skewness,realizes the key split with high frequency,and it also fully considers the extra cost in the process of equalization adjustment.The experimental results show that compared with KG,PKG,Readj,QMMP,this method has higher throughput and lower processing delay.
Keywords/Search Tags:Performance-sensitive, Streaming data Processing, Load Balancing, Key-based Operations
PDF Full Text Request
Related items