Font Size: a A A

Result Completeness Guarantee Strategy Studies In Distributed Stream Join Systems

Posted on:2022-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WangFull Text:PDF
GTID:2518306572491164Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the emergence of big data application,stream join systems are widely used in extracting valuable information among multi-source streams.An efficient stream join system needs to satisfy three requirements: scalability,high performance(high throughput and low latency)and completeness.While existing stream join system focuses on improving scalability and high performance.The system partitions all join units into two sets and organizes them as a complete bipartite graph.Each set stores tuples of one stream(R or S).When a tuple arrives,the system stores it in a join unit and partitions it to all the join units in other set to perform join.The bipartite graph model has good scalability and high performance,but it is hard to guarantee the order consistency of tuples among join units in a large-scale distributed cluster.The join unit processes tuples one by one as the orders they come,while different join units may receive these tuples at different orders and causes deficiency results and duplicate results.The result completeness is vital in many stream join applications and the abnormal results are unacceptable,such as duplicate results increase the cost of advertisers and deficiency results damage the revenue of companies in targeted advertising application.In order to address above questions,we propose Eunomia,a novel distributed stream join system which leverages an ordered propagation model for efficiently eliminating abnormal results.The ordered propagation model organizes all the join units as trees and adopts relay strategy to partition tuples.We also design a light-weighted self-adaptive strategy which adjusts the structure of the join model according to the dynamic stream input rates and workloads.We design an efficient synchronization mechanism to ensure the order of stream tuples among the root nodes.We implement Eunomia and conduct comprehensive experiments to evaluate its performance with large-scale real-world dataset.Experimental results show that Eunomia improves the system throughput by 25% and reduces the processing latency by 74% compared to state-of-the-art designs.Meanwhile,Eunomia eliminates abnormal results and provides better join completeness.
Keywords/Search Tags:Big data, Distributed stream processing, Stream join, Join completeness
PDF Full Text Request
Related items