Font Size: a A A

Skyline Query Processing Of Massive Incomplete Data Base On Space Partition

Posted on:2017-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:B YinFull Text:PDF
GTID:2308330482999745Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of Internet, Internet of things and other information technology, and multidimensional data increasing, data generation methods increasingly diverse,huge amounts of data is becoming more and more popular. Because of the abnormal machine, privacy, human error, massive data often accompanied by some of these huge amounts of incomplete data.Incomplete data often contain too much important information which user requires.So how to efficiently get the approximate result sets required by users from the massive incomplete data is an urgent problem to solve.Because the Skyline query can provide users with effective decision analysis and preferences query result, it has become an important research direction of data mining in.The traditional Skyline query method of incomplete data need data cleaning or repair of data preprocessing firstly, and then query from the repair data sets. When the traditional pretreatment method query large data or high dimension data, it will have a great time cost and a lot of error,So the traditional pretreatment method can’t meet the demand of the user’s query. Currently on the Skyline query in the massive amounts of incomplete data set of study is less, with the increase of the data, the integrity of the data appears not situation is becoming common, so huge amounts of incomplete data sets on the Skyline query research is becoming more and more meaningful.This paper proposes a Skyline query algorithm for the massive high-dimensional incomplete data sets based on Space Partition. The algorithm constitutes RankList data structure to improve the query efficiency,and reduce the impact of incomplete data for query results, divides query subspaces by combining different dimensions, and incrementally checks out the highest priority point in the subspace, that is Skyline points uniformly distributed in the incomplete data set.When the data size and the dimensions are large, the result set of traditional algorithm is very big and the query efficiency is low.In this paper, we put forward any space of Top-k Skyline query algorithms on the basis of existing papers. The algorithm fast query out the space of the Top-k collection to get the k object according to user’s requirements by constructing RankList structure and Top-k structure.The algorithm query Skyline point set efficiently, and narrow the Skyline result sets, at the same time, decrease the number comparison.
Keywords/Search Tags:Massive Incomplete Data, Skyline, Space Partition, Top-k
PDF Full Text Request
Related items