Font Size: a A A

Research On Grouping Strategy Based On Distributed Stream Processing System

Posted on:2020-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:L Y YangFull Text:PDF
GTID:2428330590974450Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of big data technology,the value of data has received more and more attention.In many cases,the value of data drops rapidly over time,so real-time data stream processing plays a pivotal role in big data technology.As an important tool for processing real-time data streams,distributed stream processing systems play an important role in the fields of Internet of Things,software log processing,and social networking.Based on the rapid development of distributed stream processing systems,stream processing grouping strategy has received more and more attention as one of the important factors affecting the system.The goal of the stream processing grouping strategy is to optimize the parallel processing of the distributed stream processing system with less overhead,thereby effectively reducing the average processing latency of the system and improving the throughput of the system,thereby improving the overall performance of the system.However,due to the skew of data flow distribution,the characteristics of data flow distribution over time,and the heterogeneity of complex systems,new challenges are proposed for the research of grouping strategies at this stage.By analyzing the memory overhead of the downstream operator maintenance state in the stream application,this paper first proposes a greedy grouping algorithm based on minimizing memory overhead.The algorithm uses the statistical method of low memory overhead to obtain the data flow distribution information,and classifies according to the frequency of the key,so as to reduce the memory overhead of the downstream operator maintenance state.Based on the statistics of the key frequency and the splitting of the key,a global routing table is established through greedy methods to maintain the balance of the system load.On the other hand,existing grouping algorithms have less consideration of the impact of system heterogeneity and data flow distribution on stream grouping,which may affect system performance in complex situations.Therefore,this paper proposes a time-aware key grouping algorithm.Time-aware is the periodic statistics and analysis of data processing time and communication time.The algorithm uses a lightweight statistical method to count the recent data flow distribution,and uses the heuristic method to select the current optimal instance as the grouping result according to the relevant time information of each instance's recent processing status,while using the time-aware server periodically receive and send relevant time information.Experiments show that the minimize memory overhead based greedy grouping algorithm the proposed in this paper improves the throughput by 9% and the average processing latency by 54% compared with the existing grouping algorithm.The timeaware key grouping algorithm has improved throughput by 19% and average processing latency by 59% compared to existing grouping algorithms.At the same time,the two grouping algorithms proposed in this paper have better scalability and stability in different application scenarios.
Keywords/Search Tags:stream processing, grouping strategy, load balance, time-aware
PDF Full Text Request
Related items