Font Size: a A A

Research On Skyline Computing Over Uncertain Data

Posted on:2015-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:L M DongFull Text:PDF
GTID:2348330509960560Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Humans are in an age of data. Data is playing a very significant role in our society.With the advances in mining technology and comprehension of data managing,uncertain data now is drawing more and more attention of researchers. Uncertain data is applied to many fields such as military, finance and telecommunication. Beside with the enormous scale of possible worlds, the various types is another feature of uncertain data. Generally, different type of uncertain data needs different querying model and algorithm when dealing with the same query. Especially for Skyline query, some researchers used a non-indexing algorithm to prune non-skyline objects for discrete uncertain data, which is based on probabilistic constraint space. But fast pruning algorithm for the other types of uncertain data is still not researched yet.Meanwhile this paper also find that, for discrete uncertain data, the logical relation of dominating between two objects is definite, however, that of continuous uncertain data is indefinite. That's because the attribute value of continuous uncertain data is in an interval. What is more, the Skyline probability computation of discrete uncertain data does not change with small value-preferring or big value-preferring principle, while that of continuous uncertain data changes with small value-preferring or big value-preferring principle.This paper studied Skyline computing over uncertain data, especially for Skyline computing based on the idea of probabilistic constraint space, which is reflected in the following two points:1. By analyzing the PCS algorithm which is for existence-level discrete uncertain data, this paper found that when pruning non-skyline objects over multi-dimension data,the pruning rate of PCS algorithm may go down. That is because it is likely to create MBRs(Minimum Bounding Rectangle) covering too much region in data space which leads to the descending number of MBR. With the descending number of MBRs, the pruning rate goes down. Considering that the final result should be acceptable when returned to users, there should be at least one object among all the objects in each MBR has the probability bigger than the threshold value q. This paper improved these two points. By using the strategy of self-updating of MBR, and assigning at least one object the probability of which is bigger than q to each MBR, the reliability and pruning rate is improved when dealing with multi-dimension uncertain data.2. In accordance of continuous uncertain data, this paper supposed that the Skyline query over continuous uncertain data should be studied based on both small value-preferring and big value-preferring principles for the first time, and take the exponential distribution uncertain data as example.3. This paper studied Skyline computing over the exponential distributionuncertain data for the first time. And then supposed a computing model for Skyline probability. This model is based on both small value-preferring and big value-preferring principles. This paper supposed a fast-pruning algorithm EDPCS which is based on the idea of probabilistic constraint space for the first time. Experimental results show that,EDPCS performs well when dealing with exponential distribution uncertain data.
Keywords/Search Tags:Uncertain data, Skyline computing, probabilistic constraint space, exponential distribution
PDF Full Text Request
Related items