Research On Skyline Query Algorithms Over Uncertain Dataset

Posted on:2010-04-01

Degree:Master

Type:Thesis

Country:China

Candidate:J J Li

Full Text:PDF

GTID:2178360275491460

Subject:Computer software and theory

Abstract/Summary:

In the real world,the inherent uncertainty,randomicity and inaccuracy of data (such as data concerning geologic survey,astronomical observation,weather forecast, sensor network,movable objects locating and data integrity,etc) may be confusing to the methodologies which were frequently used over the determinated dataset.It's an inescapable problem especially when the datasets collected are incomplete,uncertain and imprecise due to the complex external condition.Before mining the massive uncertain dataset,people usually clean it in advance.However,the preprocessed data exclude many valuable information hidden in the raw data.Unfortunately,the mining outputs are far from being practical.Recently,a novel spotlight,the uncertain dataset, is attracting tremendous research interests.The uncertain dataset connect its uncertainty with a probability or confidence level.On the other hand,Skyline is a widely studied classical problem of MCDM(Multi Criterions Decision Making). However,the existing methods can not be adopted on the uncertain data,we propose a series of effective methods to compute Skyline over uncertain data in this paper.After studying carefully the existing approaches of Skyline and Top-k query,we proposed a series of novel methodologies for Skyline query and maintenance over uncertain data base on well designed index structures.Sufficient and well-designed experiments proved our algorithms are effective and efficient.Following are the primary contributions of this paper:We introduced some advanced indexs and corresponding algorithms for Skyline mining,GIKS(Grid indexed k-Skyline) and BRKS(Bounding and refining k-Skyline). Based on a bottom-up indexed grid,the GIKS divide the data space into small cells which are easily accessed,and make use of the superiority of grid in a divide-and-conquer way.In addition,IDM(Instances distribution map) is also adopted to share the temporary information and speed up the query efficiency.GIKS designed special for dataset of objects which contain sparse instances.In regard to the application that need to handle plenty of objects with dense instances distribution,we designed a top-down index,BRKS,which is based on the averaging method and leveling trees.The two methods above are complementary on different data set.We also extend the uncertain data onto the probabilistic data stream,and developed SKY-PDS(Skyline of probabilistic data stream) which is effectively adapted to the Skyline computation and maintenance over probabilistic data stream. Based on the Possible World semantics,we modelled the problem initially.Different from the Skyline query on traditional data set,SKY-PDS focuses on the following two problems:how to identify the dominant relationship between objects efficiently and how to swept away the vulgar objects which have no chance to join the Skyline set in order to decrease the memory consumption.The solutions for the first problem should focus on reducing the checking cost as much as possible in the phrase of maintenance. We propose a series of strategies,such as probability bounding,incremental refining, pre-washing out,etc,to optimize the query efficiency.Our theoretical analyze and experiments proved that SKY-PDS is stable and efficient in time and memory capability.

Keywords/Search Tags:

Skyline, Uncertain dataset, Probabilistic data stream, Grid index, Multi criterions decision making

Related items

1	Research On Reverse Skyline Query Over Uncertain Data Stream
2	Research Of Skyline Query Processing On Uncertain Dataset Of WSNs
3	Research On Probabilistic Skyline Query Processing Over Uncertain Moving Objects
4	Skyline Query Processing Research On Uncertain Data Set
5	Research Of Skyline Query Processing Technology On Uncertain Data Stream
6	Study On Skyline Query Processing Techniques On Uncertain Data
7	Research On Distributed Probabilistic Skyline
8	Research On Algoritms To Process Skyline Query Over Data Stream
9	Decision-making Methods Based On Uncertain Linguistic Expressions And Their Applications In Communication System Evaluation
10	Research On Skyline Computing Over Uncertain Data