Research On Extending Skyline Queries Over Big Data

Posted on:2018-09-06

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H Y Zhu

Full Text:PDF

GTID:1368330623450388

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Skyline query is a key application in a wide spectrum of data management such as multi-criteria optimal decision making applications,recommendation systems,geographic information systems and big data processing.The traditional skyline point problem has been extensively investigated in recent years.A considerable amount of research has been contributed on efficient computation of skyline and its variations.However,most studies assume that the preferences among attribute values are deterministic and the uncertainty lies only in data.Therefore,they are unable to analyze queries over uncertain preferences.Furthermore,most studies only focus on analyzing individual skyline points,thus they are inadequate to answer queries that need to analyze not only individual points but also groups of points.The above two gaps largely limit the applicability of skyline queries.Since queries are conducted under different scenarios in the big data era,extending skyline queries to meet the various requirements of users is necessary.In order to enhance the applicability of skyline queries to fit the big data era,we extend skyline queries in two aspects.(1)From deterministic preferences to uncertain preferences.(2)From individual points to groups of points.The main contributions of this thesis include:1.Skyline queries over uncertain preferences: This thesis investigates the problem of skyline queries over uncertain preferences.Since the existing works contain significant theoretic shortcomings;thus they are unable to verify the correctness of their algorithms.We first review their works then propose a detailed theoretic analysis,which lays the theoretic foundation of this problem.Due to the heavy calculation introduced by adopting inclusion-exclusion principle to express the skyline probability over uncertain preferences,it needs massive time when computing skyline probabilities for large data sets.We propose a novel parallel algorithm to compute skyline probability of a given object.Moreover,we propose an adding algorithm and a deleting algorithm to deal with dynamic scenarios where new objects are added in and outdated objects are deleted.Furthermore,we extend our algorithm from computing skyline probability of a given object to all objects in a data set.The experimental results show that our parallel algorithms are more than 10× faster with 16 parallel processes than the state-of-the-art sequential algorithms.2.Parallelization of skyline groups computation: This thesis investigates the problem of parallelization of skyline groups computation.Compared to the traditional skyline point computation,computing skyline groups is much more complicated and expensive.This computational challenge promotes us to design a parallel algorithm to compute skyline groups.We first compute the skyline layers of a data set in parallel,which are a critical intermediate result.In the algorithm,we maintain an efficiently-updatable data structure for the shared global skyline layers,which is used to minimize dominance tests and maintain high throughput.Then we design an efficient parallel algorithm to find skyline groups based on the skyline layers.Experimental results show that our parallel algorithms achieve 10-fold speedup with 16 parallel threads over state-of-the-art sequential algorithms.3.Top-k dominating queries on skyline groups: This thesis investigates the problem of top-k dominating queries on skyline groups.The output size of skyline groups can be significantly large,which is a potential limitation of the skyline groups operator.We carry out a systematic study of top-k queries on skyline groups and propose a suit of efficient algorithms to find top-k skyline groups.Moreover,we design a bitmap index method to compute the scores of groups efficiently,using bitmap compression techniques to minimize the size of bitmap index.Experimental results show that our top-k algorithms outperform the brute-force method by 1 to 2 orders of magnitude.

Keywords/Search Tags:

Skyline Queries, Uncertain Preferences, Skyline Groups, Parallel Queries, Data Streams, Top-k Queries

PDF Full Text Request

Related items

1	Research On Parallel Skyline Queries Over Uncertain Data Streams
2	Research On Distributed Parallel Skyline Queries Over Uncertain Data
3	Parallel Computation Of Skyline Queries Over Uncertain Data Based On Multicore Architectures
4	Research On Distributed Parallel Skyline Query Processing Technology Over Uncertain Data Streams
5	Skyline Query Research For Massive RDF Data Under Distributed Computing Environments
6	The Research Of Skyline Queries Algorithms Based On MapReduce
7	Research On Group-Skyline Query Technology
8	Research And Implementation Of Continuous Geo-textual Skyline Queries
9	An Efficient Algorithm For Probability Skyline Queries On Discrete Uncertain Data
10	Research On Probabilistic Skyline Query Processing Over Uncertain Moving Objects