Font Size: a A A

Content Synchronization In Distributed Systems

Posted on:2020-10-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L LuoFull Text:PDF
GTID:1488306548991419Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The exposing increase of information gives birth to distributed systems with grounded on computer networks.Despite of the wide implementation of distributed systems,there are still several challenging issues,which may cause dramatic performance degradation of these deployments.Content synchronization dominates these challenges.In distributed scenarios,it is an intrinsic need for hosts to synchronize their data in a both fast and accurate fashion.For instance,for nodes in clusters,peer-to-peer networks,grids,and data centers,content synchronization techniques are needed to guarantee data consistency and instruction correctness.This thesis digs deeply into this content synchronization problem.By presenting diverse synchronization techniques,this thesis is meaningful for distributed system implementation and application development.We provide the first attempt of employing Cuckoo filter(CF)as the sketch data structure to synchronize two given sets of elements.CF represents set elements by storing their fingerprints directly.We suppose Host A and Host B employ same hash functions.Thus,Host A can search the common elements from the received CF vector from Host B,and vise versa.Those elements,which cannot be found in the received CF vector,are the different elements,which only appear in the local set.With given conditions,this method will not introduce false positive errors and seldom incurs false negative errors.Consequently,we provide high synchronization accuracy.Besides,CF has it natural advantage upon space utilization.We then extend the content synchronization into the multiset domain.To reconcile two multisets,we first propose a new variant of Bloom filter,named Inveritible Counting Bloom filter(ICBF).This new data structure enables the synchronization of multisets.When encoding,ICBF relies on a series of independent hash functions to map each element into the ICBF vector.Besides,we invent a special identifier to label each element.After representing the two multisets as two ICBF vectors,the subtracting operation is executed to wipe out common elements from the two ICBFs.The different elements,on the other hand,will be remained into the subtracting result,i.e.,a new ICBF vector.At last,ICBF lists these different elements from the subtracting result inversely.As a result,all the different elements can be determined correctly with high probability.We further consider the requirement of synchronization accuracy of upper applications.We redesign the Trie and Fenwick Tree(FT)data structures to represent and thereafter reconcile multiset elements near-accurately.Moreover,to decrease the communication overhead during synchronization,we present the partial transmission strategy when exchanging Tries and FTs.We conduct trace-driven evaluations towards our proposals.The results indicate that the Trie-based and the FT-based synchronization methods realize near accurate set synchronization,and they are 4.31 times and 2.96 times faster than the CBF-based method,respectively.The synthetic dataset based experiments further confirm that our methods,at most times,achieve higher synchronization accuracy and incur much less communication overhead than the CBF-based method.We then extend the content synchronization problem from two sets to multiple sets.To tackle this multi-party set synchronization problem,we first propose the Marked Cuckoo filter(MCF)data structure to represent the involved sets of elements.Based on this data structure,we implement the MCFsyn synchronization protocol.MCFsyn aggregates and then distributes the MCFs from all the participants along with the underlying minimum spanning tree.Each participant then traverses the overall MCF vector to determine its missing elements,as well as its exclusive elements.For missing elements,MCFsyn chooses the optimal element content provider such that the transmission overhead is minimized.The evaluation results indicate that MCFsyn significantly outperforms its same kind in terms of both synchronization accuracy and communication overhead.Lastly,beyond the general element content,we consider the synchronization of two given topologies.To this end,we design the graph filter,a novel space-efficient data structure which is competent to represent the nodes and edges in a given topology simultaneously.With such a design,given two topologies,we represent them with our graph filters and thereafter list the different edges between them in an inverse manner.To do so,three oprations are attached to graph filters: encoding,subtracting and decoding.Although such operations are sufficient to solve the topology calibration problem,two challenging issues still remain open.First,the XOR traps which occur with low probability at the encoding stage may result in a few miscalculations at the decoding process.Thus,we propose another augmented decoding algorithm to lessen the impact of XOR traps via terminating illegal decoding processes.Second,several different links may form cycles in the worst case;hence,we further design a cycle destruction algorithm to make such different links become decodable.We implement the graph filter and the associated topology calibration method.Comprehensive evaluations indicate that our method can efficiently realize the topology calibration with high probability,incurs the least space overhead and supports invertible decoding reasonably.
Keywords/Search Tags:Distributed Systems, Synchronization Accuracy, Invertible Counting Bloom Filter, Differentiation Synchronization, Communication Overhead, Sketch Data Structures, Bloom Filter, Cuckoo Filter, Trie Tree, Fenwick Tree, Graph Filter
PDF Full Text Request
Related items