Font Size: a A A

Research On Skyline Query Algorithms Over Uncertain Dataset

Posted on:2010-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiFull Text:PDF
GTID:2178360275491460Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the real world,the inherent uncertainty,randomicity and inaccuracy of data (such as data concerning geologic survey,astronomical observation,weather forecast, sensor network,movable objects locating and data integrity,etc) may be confusing to the methodologies which were frequently used over the determinated dataset.It's an inescapable problem especially when the datasets collected are incomplete,uncertain and imprecise due to the complex external condition.Before mining the massive uncertain dataset,people usually clean it in advance.However,the preprocessed data exclude many valuable information hidden in the raw data.Unfortunately,the mining outputs are far from being practical.Recently,a novel spotlight,the uncertain dataset, is attracting tremendous research interests.The uncertain dataset connect its uncertainty with a probability or confidence level.On the other hand,Skyline is a widely studied classical problem of MCDM(Multi Criterions Decision Making). However,the existing methods can not be adopted on the uncertain data,we propose a series of effective methods to compute Skyline over uncertain data in this paper.After studying carefully the existing approaches of Skyline and Top-k query,we proposed a series of novel methodologies for Skyline query and maintenance over uncertain data base on well designed index structures.Sufficient and well-designed experiments proved our algorithms are effective and efficient.Following are the primary contributions of this paper:We introduced some advanced indexs and corresponding algorithms for Skyline mining,GIKS(Grid indexed k-Skyline) and BRKS(Bounding and refining k-Skyline). Based on a bottom-up indexed grid,the GIKS divide the data space into small cells which are easily accessed,and make use of the superiority of grid in a divide-and-conquer way.In addition,IDM(Instances distribution map) is also adopted to share the temporary information and speed up the query efficiency.GIKS designed special for dataset of objects which contain sparse instances.In regard to the application that need to handle plenty of objects with dense instances distribution,we designed a top-down index,BRKS,which is based on the averaging method and leveling trees.The two methods above are complementary on different data set.We also extend the uncertain data onto the probabilistic data stream,and developed SKY-PDS(Skyline of probabilistic data stream) which is effectively adapted to the Skyline computation and maintenance over probabilistic data stream. Based on the Possible World semantics,we modelled the problem initially.Different from the Skyline query on traditional data set,SKY-PDS focuses on the following two problems:how to identify the dominant relationship between objects efficiently and how to swept away the vulgar objects which have no chance to join the Skyline set in order to decrease the memory consumption.The solutions for the first problem should focus on reducing the checking cost as much as possible in the phrase of maintenance. We propose a series of strategies,such as probability bounding,incremental refining, pre-washing out,etc,to optimize the query efficiency.Our theoretical analyze and experiments proved that SKY-PDS is stable and efficient in time and memory capability.
Keywords/Search Tags:Skyline, Uncertain dataset, Probabilistic data stream, Grid index, Multi criterions decision making
PDF Full Text Request
Related items