Font Size: a A A

Research On Algorithms For Skyline Query Processing

Posted on:2020-03-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:K Q ZhangFull Text:PDF
GTID:1368330614950731Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid growth of data size,all areas are facing big data.In the face of such a huge amount of data,data management systems are particularly important.Query processing is one of the most important components of data management systems.Because it is able to quickly and intuitively reflect the various characteristics of the data and to provide references for users' decisions.Up to now,the most common query techniques in data management systems are skyline query,nearest neighbor query,top-k query,aggregate query,join query,similarity query,and so forth.Among them,skyline query processing is an important multi-dimensional preference query,and it has been widely used in multi-dimensional decision making applications,personalized recommendation,data mining,etc.Skyline query processing has been widely concerned by researchers at home and abroad and extended to many different environments.However,the existing research work still has many shortcomings in non-index efficient memory algorithm of skyline query processing,skyline query processing on incomplete data,group-based skyline query processing and skyline query processing on vertically distributed environment.In this thesis,we mainly focus on these four aspects.The main research results are listed as follows:First,we study space-partitioning-based algorithm of skyline query processing.There are mainly two types of skyline algorithms,index-based ones and index-independent ones.Although index-based skyline algorithms are efficient,the time and space cost of constructing indices are huge.Up to now,space partitioning framework is the best index-independent algorithm for computing skyline and draws our attention.However,the existing space-partitioning-based skyline algorithms neglect the importance of uniform partitioning to space partitioning framework.In this thesis,we propose an efficient skyline algorithm VMPSP.VMPSP constructs virtual median points as partitioning points to recursively partition the dataset,which makes partitioned subsets more uniform and reduces the number of partitioning and the comparisons between points.It improves the efficiency of skyline computation.The time complexities of the algorithms are analyzed and the efficiency of our proposed algorithm is verified by the experiments.Secondly,we study skyline query processing on incomplete data.Data incompleteness is common in real applications.However,the existing skyline definition on incom-plete data has some vital defects.Based on this definition,the dominance between two points only focuses on their common valid dimensional values and neglects missing values,which leads that the result of skyline query on incomplete data may be empty or include some abnormal points.This result cannot provide users with valuable references.To overcome this problem,in this thesis,we propose a novel skyline definition utilizing probabilistic model on incomplete data where each point has a probability to be in the skyline.In particular,it returns k points with the highest skyline probabilities to guarantee the result size and the effectiveness.In order to compute the probability of a point being skyline,the probability density functions of incomplete data points are necessary.In this thesis,we propose incomplete models on independent,correlated and anti-correlated distributions to estimate them using existing valid values.In addition,we propose three efficient algorithms SPISkyline,SPCSkyline and SPASkyline for probabilistic skyline computation on incomplete data complying with independent,correlated and anti-correlated distributions,respectively.They employ pruning strategy,optimization of the process of probability computation,and sorting technique to improve the efficiency of probabilistic skyline computation on incomplete data.The time complexities of the algorithms are analyzed and the efficiency and the effectiveness of our proposed algorithms are verified by the experiments.Then,we study the problem of skyline query processing on uncertain data.Data uncertainty is common in real applications and skyline query processing on uncertain data has been widely studied.However,how to efficiently compute skyline result on vertically distributed uncertain data still has not attracted much attention.In this thesis,we present three communication-efficient p-skyline algorithms ASR,IASR and FSLR on vertically distributed uncertain data.In the preprocessing stage,we organize the records at servers as a sorted attribute list for each attribute,and each of them supports both sorted and random accesses.ASR alternates sorted and random accesses on sorted attribute lists in round-robin manner to update the upper and lower bounds of skyline probabilities of the objects until all the objects can be determined whether they are in the result or not.IASR is an improved version of ASR.IASR early terminates accesses to further reduce the cost of communication.FSLR fist uses sorted accesses with low cost to get loose upper and lower bounds,and then adopts random accesses with high cost to refine upper and lower bounds.It reduces redundant random accesses.The communication complexities of the algorithms are analyzed and the efficiency of our proposed algorithms is verified by theexperiments.Finally,we study the problem of G-Skyline query processing.It aims to retrieve all optimal groups for users' decisions.However,with the increasing of data dimensionality,G-Skyline returns more groups,which makes that users cannot determine which groups are satisfactory.In this thesis,we propose a novel concept of k-dominant G-Skyline,which is a subset of G-Skyline and is more representative.Unfortunately,existing GSkyline methods cannot be directly applied for finding k-dominant G-Skyline groups.This is because transitive property of traditional dominance relationship cannot hold any more for k-dominance.In this thesis,we present a two-phase algorithm to efficiently compute k-dominant G-Skyline groups.In the first phase,we construct lk DG structure while pruning the points never included in any k-dominant G-Skyline group as much as possible.In the second phase,using lk DG,we propose two efficient k-dominant GSkyline searching methods SM-P and SM-G,which generate new candidate groups from single points and ancestor groups,respectively.The time complexities of the algorithms are analyzed and the efficiency of our proposed algorithms is verified by the experiments.
Keywords/Search Tags:Skyline, query processing, incompleteness, uncertainty, distribution
PDF Full Text Request
Related items