Font Size: a A A

Research Of Query Processing Method On Top-k Skyline In Mapreduce

Posted on:2017-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:A L LiuFull Text:PDF
GTID:2308330482999731Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the vigorous development of the Internet application and cloud computing technology, the global data storage has an explosive growth. So how to help users find interesting data objects in large-scale data set has become the focus in the study of researchers. Skyline query is used to solve the problem of multi-objective decision making and data visualization. But with the increasing amount of data size and data dimension, skyline query returns such a large result set so that users will be difficult to select useful data objects from candidate results. Hence academia puts forward the conception of top - k skyline query. Through the scoring function, top - k skyline query returns the first k data objects to users in order to keep the proper size of the result set. It also avoids the problem generating so many query results. However in the face of large data sets in the era of big data, top - k skyline query processing method is of low efficiency and its query time is long. As a consequence it is difficult to meet the needs of the large data processing and top-k skyline processing in large data environments has become an urgent problem. MapReduce framework is a kind of parallel computing programming model used for processing large data sets and it has high scalability and good fault tolerance. Therefore the article stydies the top-k skyline query processing method in MapReduce.First of all, because of the expensive I/O operation and the lack of progressive issues of recent researches, we design and propose top-k skyline query processing method MR-DTKS based on k-skyband in MapReduce. The method uses data point transformation to determine the dominance relationship between data points and uses the k-skyband set to filter unpromising data points. It avoids calculating the global skyline of whole data set and simplifies the comparisons of dominance relationship between data points. Moreover it makes use of k-skyband to filter out a part of unpromising data points in advance to avoid these results parcicipating in the later calculation. It reduces the meaningless comparisons of dominance relationship between data points, thereby saving a large amount of computing time and storage space overhead.Secondly, in order to fully consider various user preferences we design and propose top - k skyline query processing method P/T-SKY-MR based on user preference in MapReduce. This method defines coding in partially ordered domains to present user preferences and realize filtering. Then it defines dominating score in totally ordered domains to filter again. Finally it gets a score based on partially ordered domains and totally ordered domains and returns the query results. The method considers the partially ordered domains as well as totally ordered domains and it realizes secondary filtering during executive process. It reduces the incidence of candidate results and the comparisons of dominance relationship between data points. In this case it reduces the computational overhead and the response time. The query results meet the needs of users still further.Finally, in order to verify the validity and rationality of this algorithm, we have conducted the experimental analysis from three aspects of the query response time, space storage overhead and the comparisons of dominance relationship between data points. A large experiments show that our proposed top-k skyline query processing method in MapReduce can reduce the comparisons of dominance relationship between data points efficiently. Furthermore it can improve the query efficiency and reduce storage space overhead.
Keywords/Search Tags:top-k skyline, MapReduce, dominance relationship, user’s preference
PDF Full Text Request
Related items