Font Size: a A A

Multi-Bloom-Filter Query Algorithms And Their Applications

Posted on:2014-10-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:X M TianFull Text:PDF
GTID:1268330425483973Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
A Bloom filter is an excellent data structure that can succinctly represent a data set in order to support membership queries, and effectively filter out the elements that do not belong to the set. It is widely used in databases, networks and distributed systems and it has great potential for distributed applications where systems need to share information about available data. Many research communities have been taking deep researches into Bloom filter algorithms and their applications, and there will be more Bloom filter algorithms and their applications to be found.Usually we use Bloom filter in such a general scene:set5represented in Bloom filter BF(S), when we query whether an element x belong to S, in order to save memory space and to improve query time efficiency, we query BF(S) instead of S1to check whether x belong to S or not. At present most of the variants of Bloom filter algorithm and their applied researches have been conducted on single Bloom filter. The goal of this thesis was to study how to use multi-Bloom-filter query algorithm in P2P systems for content distribution or data synchronization applications.We conducted in-depth study for multi-Bloom-filter query algorithm in theoretical analysis and practical application. At first we analyzed the opportunities brought about to multi-Bloom-filter query algorithms by the rapid development of the network, and found out that the multi-Bloom-filter query algorithms were worth studying. Secondly we summarized the research status of the multi-Bloom-filter query algorithm and main findings of the multi-Bloom-filter query algorithm. Because single-Bloom-filter query algorithms can not be be fully qualified for solving distributed content distribution and data synchronization problems, we presented a few query algorithms using multiple Bloom filters, such as double-separate-Bloom-filter query algorithm, counting-Bloom-filter algebra operation query algorithm, counting-Bloom-filter based set reconciliation algorithm which use counting-Bloom-filter subtraction operation and Bloom-filter based exact set reconciliation algorithm which use multiple Bloom filters to reconcile sets. Some creative works in this thesis are focused on the following aspects:1) This thesis examined the problem of membership queries for sets’union, intersection, complementary set, sets’difference or symmetric differences between sets by using double separate Bloom filters. The theoretical analysis and experimental results show that the double-separate-Bloom-filter query algorithm can support membership queries for sets’union, intersection, complementary set, sets’difference or symmetric differences between sets, of which membership queries for sets’union or intersection have no false negatives, only a few false positives, while membership queries for complementary set, sets’difference or symmetric differences between sets have a small number of false negatives in addition to small amounts of false positives.2) Because membership queries for complementary set, sets’difference or symmetric differences between sets using double separate Bloom filters have a small number of false negatives, we attempted to seek solutions to this problem using algebra operations on counting Bloom filters. This thesis examined the consistence between algebra operations on counting Bloom filters and algebra operations on data sets, and studied the membership query performances of algebra operations on counting Bloom filters. Theoretical analyses and simulations show that the counting Bloom filter which is ORed(ANDed, COMPLEMENTed, SUBTRACTed, XORed) from the original counting Bloom filters can support membership query on the set ORed(ANDed, COMPLEMENTed, SUBTRACTed, XORed) from the original data sets. When using double-separate-counting-Bloom-filter query algorithm to query elements belonged to complementary set, sets’difference or symmetric differences of the two sets, some elements in complementary set, differences or symmetric differences of the sets will be misjudged, while the query method using algebra operations on counting Bloom filters has no false negatives and gain a remarkable improvement in space complexity and time complexity over the method using double-separate-counting-Bloom-filter query method, hence it can be applied to various network fields. For example, counting-Bloom-filter SUB operation can be used in set reconciliation protocol for distributed database or file systems.3) Set reconciliation aims at efficiently reconciling two similar sets held by different hosts while minimizing the communication complexity. Set reconciliation is a fundamental problem in distributed computing, arising in protocols for content delivery, gossiping, synchronization and replication. On the basis of analyzing the existing accurate set reconciliation algorithm using characteristic polynomial interpolation-based synchronization (CPISync), we presented an exact set reconciliation algorithm based on multiple Bloom filters (BFESR). There is a premise when we use CPISync to reconcile data:the number of evaluated points shoud be greater than the size of the symmetric differences between remote sets (|SA-SB|+|SB-SA|). In distributed systems, the size of the symmetric differences between remote sets is often unable to be known in advance, therefore CPISync algorithm can not be simply invoked to reconcile data. Existing solutions to this problem are usually heuristics that iteratively increase the number of evaluated points and characteristic polynomial values until the total number of evaluated points exceeds the size of the symmetric differences, then successful interpolation is achieved. These heuristics have too long reconciliation latency. BFESR has less reconciliation latency. The basic idea of BFESR is to obtain symmetric difference size between remote sets before calling CPISync algorithm, thus characteristic polynomial values for CPISync algorithm are wholesale transferred. At first, BFESR estimates symmetric difference size between remote sets SA and SB using Bloom filters, and calculates the same number of characteristic polynomial values as the estimated size of the symmetric differences between SA and SB, and then interpolates rational polynomials, finally recovers the sets SA-SB and SB-SA through factoring the rational polynomials and gets the union of SA and SB-Theoretical analyses and experimental results show that, compared with the existing methods for exact set reconciliation, BFESR needs only a single round-trip to get the accurate union of the sets in most cases, thus greatly reducing reconciliation latency. BFESR uses Bloom filters’inner-product or quasi-intersection query method to estimate the size of the symmetric difference between sets, both methods have higher estimating accuracy, and the quasi-intersection query method estimates closer to the actual size of the symmetric differences between sets. Compared with existing heuristics for exact set reconciliation, BFESR has very significantly less reconciliation time and fewer message exchange rounds, particularly when BFESR uses quasi-intersection query method to estimate the size of the symmetric differences between sets.4) Because Bloom filters used in BFESR does not support dynamic update of set elements, BFESR is not suitable for data reconciliation in update-intensive distributed systems. To solve this application limitation of BFESR algorithm, we present a new set reconciliation algorithm based on multiple counting Bloom filters, which we call Counting-Bloom-filter based exact set reconciliation(CBFESR). This method represents sets SA and SB as counting Bloom filters, subtracts SA’s counting Bloom filter from SB’S counting Bloom filter, the differences we denote CBF(SB)-CBF(SA), then determines SB-SA elements in SB with CBF(SB)-CBF(SA), and finally performs union operation on SB-SA and SA. Counting-Bloom-filter based set reconciliation combines the advantages of exact set reconciliation and approximate set reconciliation, besides gaining all SB-SA elements with single-round messgage exchange, it can also be applied to update-intensive distributed systems because counting Bloom filters support element deletion operation.
Keywords/Search Tags:peer to peer, Bloom filter, counting Bloom filter, algebra operations, setreconciliation, exact set reconciliation
PDF Full Text Request
Related items