Font Size: a A A

Studies On Probabilistic Nearest Neighbor Queries Over Uncertain Data

Posted on:2015-10-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J LiFull Text:PDF
GTID:1318330482955832Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of wireless communications and geo-positioning, a large number of mobile devices with positioning capabilities become very popular, such as the smartphone, pad, GPS navigation system, animal telemetry system and so on. The locations generated by these users can provide a mass of spatial information for the LBS (Location Based Services) applications, which brings the rapid rise of LBS. Nearest Neighbor (NN) related queries are very important query types, and there are many studies on the processing of queries over precise data. However, data are often uncertain in numerous applications due to limitations of measuring equipment, delayed data updates, or privacy protection. In the uncertain environment, NN related queries still have many applications and the traditional query algorithms over precise data can't be directly applied to the uncertain data, therefore, the studies on the processing of probabilistic NN related queries over uncertain data have extensive application prospect and theory value.The probabilistic threshold NN query returns all the data objects satisfying the query with probabilities higher than a user-specified probability threshold. Although there already have some outcomes in studies on PTQs, more efficient query algorithms are needed to quickly get the answers in more applications. Based on the discrete uncertain model, this thesis analyzes the limitations of the existing work and proposes efficient algorithms for PTQs on reverse NN and group NN. Moreover, a general framework for PTQs is proposed which focuses on the threshold setting problem, and the processing of the PTQs on NN and reverse NN queries are illustrated. The main contributions of this thesis are:(1) Considering the existing algorithms for the PTQ on reverse k (=1) NN cannot be efficiently applied to the case where k> 1, two pruning algorithms are proposed to support arbitrary values of k. The spatial pruning algorithm, based on the spatial distances and the angles, uses multiple angle intervals to store the pruning region for filtering objects. This method can reduce many geometrical calculations (e.g. calculating the perpendicular bisectors or the intersections between line segments) and further improve the efficiency of the PTQ on reverse k (>1) NN. The probabilistic pruning algorithm, based on the probability distributions of objects, computes the upper bound of probability for each object in the candidate set and compares them with the threshold to filter more objects. And the intermediate results produced by the spatial algorithm are in full use during computation.(2) Considering the existing algorithms for the PTQ on group NN are inefficient when the uncertain regions are irregular shapes, the specified threshold is too large or too small, or the query points set has a sparse distribution, a local index and three pruning algorithms are proposed which are not sensitive to the above factors. The first spatial pruning algorithm incrementally searches the probable NN results for each query point, and combines the intermediate results to filter objects. And the second spatial pruning algorithm replaces the query set by the centroid point and utilizes the max-min distance pruning metric to filter objects. And the probabilistic pruning algorithm partitions the uncertain regions into many subregions and computes the upper and lower bounds of probability for further filtering or verification. At last, the proposed algorithms are extended to solve the probabilistic group k NN problem in the case of k>1.(3) Most previous work focused on the efficiency of query process, but paid no attention to the setting of thresholds which is also an important problem. Setting the thresholds too high or too low may lead to empty result or too many results. A general framework, which is based on the threshold classification using extreme learning machine, is proposed to solve the threshold setting problem for PTQs. The framework allows the users to input the required number range of results directly, and the threshold classification algorithm transfer the number range into a suitable threshold. Then the existing algorithms for PTQs are executed. And the framework is applicable to any type of PTQs, and all the phases are compatible with the traditional algorithms for PTQs except the way of setting thresholds. Furthermore, the features which may affect the objects being the query results are analyzed and selected for the PTQs on NN and RNN respectively, and the proposed framework is used to answer these two queries. The plurality voting method, the strategy for the modification of threshold and the dynamic classification strategy are proposed to further improve the performance of the classification.In this thesis, we propose several efficient algorithms for the PTQs on NN, reverse NN and group NN queries. Extensive experiments using both real and synthetic data show that our alorithms outperform the existing work under various settings.
Keywords/Search Tags:uncertain data, probabilistic threshold query, nearest neighbor query, reverse nearest neighbor query, group nearest neighbor query
PDF Full Text Request
Related items