Font Size: a A A

Research On Dimensionality Reduction And Quantification Methods Of Approximate Nearest Neighbor Query For Streaming Data

Posted on:2019-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhaoFull Text:PDF
GTID:2438330551960868Subject:Intelligent computing and systems
Abstract/Summary:PDF Full Text Request
Nearest Neighbor Search is the basic problem of real-time analysis of all kinds of big data,which is to measure the difference or similarity between different objects,and to find the similar or semantic related objects in the data set.Mathematically,Objects,including text,web pages and images,can be represented as a collection of vectors or vectors.The ideal large data similarity query should have low latency,high throughput and stable operation.Real-time nearest neighbor search in streaming data scenarios is of great research value,which has a wide range of applications in real-time analysis such as smart grid,smart city and public service.In this paper,by researching the sampling algorithm of streaming data,using the sliding window hierarchical sampling algorithm to concentrate the massive data,and sampling the concentrated data implement dimensions reduction.Simultaneously studying the quantization and distance measurement of an effective approximate nearest-neighbor search,and maximizing the original data neighbor structure achieve the nearest neighbor search of streaming data,and improve the query accuracy.The main research work in this paper are as follows:Firstly,proposing a sliding window hierarchical sampling method based on time-weighted sliding window model.By dividing the sliding window into several basic windows and using the preset attenuation function to set the corresponding weight of each basic window,and then according to the weight value and the number of data elements in the basic window to set the sampling ratio,according to the sampling ratio sampling the stream data to concentrate the data.At the same time,the data dimension is sampled to reduce dimensions by calculating the importance of different data dimensions.Secondly,because of the current hash-based approximate nearest-neighbor search methods usually learn a set of hyperplanes for data projection,and simply binarize the results from each hyperplane division,while ignoring the fact that the information may be unevenly distributed throughout the projection dimension,and the values of the data in each dimension projection may not be the same.In this paper,we proposed a dynamic adaptive coding quantization method,which dynamically allocates the corresponding binary coded bits to the dimension according to the information amount of the projection dimension.By using the dynamic programming,the total information of all the projections is maximized so as to keep the neighborhood structure of the original data as much as possible.Thirdly,based on the dynamic adaptive coding quantization algorithm,we proposed a dynamic adaptive distance measurement algorithm,which calculates the distance between binary codes according to the number of bits coded for each projection dimension.So as to solve the problem that the existing distance measurement method reduces the entire binary string as a entirety to calculate distance,which only applies to the unit quantification.Aiming at the above algorithms proposed in this paper,the experiments are carried out in the end.The experimental results show that sampling the streaming data through the sliding window hierarchical sampling algorithm effectively preserves the summary information of the streaming data,and prove that the dynamic adaptive coding quantization method has a significant improvement over the traditional hash quantization method.The theory proves that the dynamic adaptive coding method and the distance measurement method is superior to the traditional fixed-digit quantization coding and the Hamming distance measurement method on maintaining the neighborhood structure of original data.
Keywords/Search Tags:quantization, stream data, dimension reduction, approximate nearest neighbor, dynamic adaptive coding, dynamic programming, dynamic adaptive distance
PDF Full Text Request
Related items