Font Size: a A A

An Efficient Algorithm For Probability Skyline Queries On Discrete Uncertain Data

Posted on:2014-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2248330398968916Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the research on uncertain data objects deeply, the Skyline query for uncertain data objects received widespread attention in the field of data mining and data analysis. Uncertain data objects are divided into two types:continuous uncertain data and discrete uncertain data. The main research for uncertain data focuses on continuous uncertain data, and the research for discrete uncertain data is lacking. This thesis took in-depth research on the features and properties of the discrete uncertain data, and studied the Skyline query on discrete uncertain data. The main research contents and contributions are as follows.Firstly, Based on the analysis of the existing probabilistic Skyline query algorithm, the thesis proposed an efficient algorithm--which be named Substitution--for probability Skyline queries on discrete uncertain data. The algorithm divides the set of the instance for uncertain data objects continuously. And at the same time, the algorithm selects the data objects through comparing their Skyline probability bounds. The uncertain data objects which be selected constitute the p-Skyline (the set of uncertain data object which’s Skyline probability is larger than p). The algorithm through the boundary approximation method avoids the unnecessary time cost and improves the efficiency of execution. The results of experiment have proved the correctness and usefulness of the algorithm.Secondly, this thesis extended the Substitution algorithm through clustering method. With the increase of the amount of data, the efficiency of the algorithm will decrease rapidly. In order to improve the efficiency and accuracy, the cluster analysis is taken as a preprocessing method of the algorithm. The results of experiment show that extended algorithm has achieved the outstanding balance between efficiency and accuracy, and can adjust the accuracy and the efficiency dynamically through adjusting the parameters of clustering method. The Substitution algorithm and its expansion largely enhanced the applicability, accuracy and validity for Skyline query in discrete uncertain data objects.
Keywords/Search Tags:data mining, skyline query, uncertain data, cluster analysis
PDF Full Text Request
Related items