Bloom Filter Algorithm For Low-overhead Approximate Query For Missing Data

Posted on:2022-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:J W Wu

Full Text:PDF

GTID:2518306731977959

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The computer network has developed rapidly in recent years,information sharing happens all the time,this makes how to quickly locate the required resources in the complex information world has become the focus and challenge of current research.The Bloom filter proposed by Bloom is a data structure that efficiently solves existing problems.It can quickly match elements with a small storage cost.However,with the development of the network,the precise query supported by the traditional Bloom filter can't fully meet the needs of emerging network applications.These network applications require Approximate Membership Query(AMQ)for data.In recent years,scholars have also proposed many Bloom filter algorithms for approximate member queries,these algorithms do not support approximate queries for missing data.But most of the data shows the characteristics of high dimensionality,dimensional redundancy,and missing values,which limits the scope of use of the traditional approximate query Bloom filter.At the same time,when the data dimension is high,the time required for the hash operation will be longer,which means that the time cost of inserting and querying high-dimensional data is relatively high.Therefore,this paper proposes a low-overhead Bloom filter for approximate membership query for missing data.The main work of this paper are as follows:A Bloom filter structure for approximate query of missing data based on PCA dimensionality reduction is proposed.At the same time,data insertion and query operations under this structure are designed.This article uses missing and dimensional redundant data to train the PCA projection matrix,the matrix is highly adapted to the dimensionality reduction of the data that is distributed with the training data and has missing values,then we use the projection matrix to reduce the dimensionality of the subsequent query data,and then use the reduced data to perform insert and query operations.The above steps make it possible to query approximate members of highdimensional redundant and missing data,and at the same time reduce the time cost by reducing the dimensionality of the data.This article uses real data sets to fully verify that the false positive rate of the algorithm is lower than other algorithms.A Bloom filter algorithm based on autoencoder dimensionality reduction for approximate membership query for missing data is proposed.Since the standard PCA dimensionality reduction is a linear dimensionality reduction method,the effect will be reduced when the data is linearly inseparable.Therefore,a non-linear dimensionality reduction method based on the Autoencoder is proposed to replace the previous dimensionality reduction process.This method is highly adapted to nonlinear data dimensionality reduction.This paper confirms the effectiveness of the Bloom filter based on two dimensionality reduction methods for approximate membership query through theoretical analysis.Comparative experiments on real data sets prove that in the case of linear inseparability of data sets,the approximate query effect based on dimensionality reduction of the autoencoder is better.

Keywords/Search Tags:

Bloom Filter, Approximate membership query, Locality-Sensitive Hashing, Dimensionality reduction

PDF Full Text Request

Related items

1	Hashing Based Research On Approximate Query Of Time Series
2	Research Of Approximate Nearest Neighbor Search Based On Locality Sensitive Hashing
3	Research On Local Sensitive Hashing And Approximate Nearest Neighbor Algorithm
4	Reasearch On Locality Sensitive Hashing Based Approximate Nearest Neighbor(s) Searching Algorithm
5	Research Of Approximate K-Nearest Neighbors Search Algorithm Based On Locality Sensitive Hashing
6	Efficient Algorithms For Approximate Aggregation And Nearest Neighbor Queries Over Multi-Dimensional Data
7	Enabling Secure And Efficient Multi-keyword Fuzzy Retrieval Over Encrypted Data
8	Locality Sensitive Hashing Index Based On Neighborhood Collision Counting
9	LSH-based Similarity Search Of Web Data
10	Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach