Skyline Query Processing Of Massive Incomplete Data Base On Space Partition

Posted on:2017-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:B Yin

Full Text:PDF

GTID:2308330482999745

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years, with the rapid development of Internet, Internet of things and other information technology, and multidimensional data increasing, data generation methods increasingly diverse,huge amounts of data is becoming more and more popular. Because of the abnormal machine, privacy, human error, massive data often accompanied by some of these huge amounts of incomplete data.Incomplete data often contain too much important information which user requires.So how to efficiently get the approximate result sets required by users from the massive incomplete data is an urgent problem to solve.Because the Skyline query can provide users with effective decision analysis and preferences query result, it has become an important research direction of data mining in.The traditional Skyline query method of incomplete data need data cleaning or repair of data preprocessing firstly, and then query from the repair data sets. When the traditional pretreatment method query large data or high dimension data, it will have a great time cost and a lot of error,So the traditional pretreatment method can’t meet the demand of the user’s query. Currently on the Skyline query in the massive amounts of incomplete data set of study is less, with the increase of the data, the integrity of the data appears not situation is becoming common, so huge amounts of incomplete data sets on the Skyline query research is becoming more and more meaningful.This paper proposes a Skyline query algorithm for the massive high-dimensional incomplete data sets based on Space Partition. The algorithm constitutes RankList data structure to improve the query efficiency,and reduce the impact of incomplete data for query results, divides query subspaces by combining different dimensions, and incrementally checks out the highest priority point in the subspace, that is Skyline points uniformly distributed in the incomplete data set.When the data size and the dimensions are large, the result set of traditional algorithm is very big and the query efficiency is low.In this paper, we put forward any space of Top-k Skyline query algorithms on the basis of existing papers. The algorithm fast query out the space of the Top-k collection to get the k object according to user’s requirements by constructing RankList structure and Top-k structure.The algorithm query Skyline point set efficiently, and narrow the Skyline result sets, at the same time, decrease the number comparison.

Keywords/Search Tags:

Massive Incomplete Data, Skyline, Space Partition, Top-k

PDF Full Text Request

Related items

1	Research On Key Technologies Of Skyline Query Processing On Massive Data
2	Research Of Skyline Preference Query Based On Incomplete Dataset
3	Research And Implementation Of G-Skyline Query Algorithm On Massive Data
4	Research On K-dominant Skyline Algorithm Based On MapReduce And Incomplete Data Stream
5	Study Of K-Dominant Skyline Algorithms For Incomplete Data Stream
6	Research On Skyline-Join Query Processing Of Incomplete Datasets With Crowdsourcing
7	Answering Skyline Queries Over Incomplete Data With Crowdsourcing
8	Skyline Query Research For Massive RDF Data Under Distributed Computing Environments
9	Research On SKYLINE Preference Query Technology Over Incomplete Data
10	Top-k Skyline Query Algorithm Based On Data Partition In Distributed Environment