Font Size: a A A

Research On Representative Skyline Algorithm Based On Large-scale Data Sets

Posted on:2017-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:T T CaiFull Text:PDF
GTID:2358330503481837Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently, skyline computation has become a hot topic in the database community which attracted much attention in research community, due to a large number of applications in multi-objects decision, database, data visualization, data mining and other related areas.In the literature, there are a number of skyline computation algorithms that were developed, including index-based algorithm, Block-Nested-Loop algorithm, Branch-and-Bound search algorithm, Nearest Neighbor search and so on. However, in the era of big data, the data volume is typically very huge, thus resulting in the number of skyline points that is often very large. To better understand the skylines in the era of big data, the concepts of representative skyline were raised. Generally, the research of representative skyline computation consists of two aspects. The first one focuses on the definition of representative skyline, and the second one concentrates on the computation of representative skyline. This paper is mainly focus on the computation of representative skyline. Our contributions are summarized as follows.First, this paper performs a comprehensive study on the existing definitions of representative skyline. Specifically, four notable definitions of the representation skyline are discussed which are: domination-based, distance-based, threshold-based and significance & diversity-based representative skylines. We conduct a detailed analysis of their characteristics and also compare them with each other. We also discuss the advantages and disadvantages of all those definitions.Second, we develop a new representative skyline computation algorithm based on the definition of distance-based representative skyline in the 2D space. In particular, we propose a new approach to quickly find the minimum radius of a circle that covers the skyline points. Our new approach reduces the time complexity ?(2) to ?(log ). Besides, we propose a near-optimal approximate algorithm, denoted by ARS, which runs in ?(2mlog(?)) time to compute the representative skyline in the 2D space. Furthermore, we also propose an efficient exact algorithm, denoted by PSRS, which takes ?(23) time complexity to compute the DBRS in the 2D space. Unlike the existing DP-based algorithm, both the ARS and PSRS algorithms are based on a newly-devised parametric search technique.Finally, we conduct comprehensive experimental studies to evaluate the proposed algorithms, and the results confirm our theoretical findings.
Keywords/Search Tags:multi-objects decision, skyline, representative skyline
PDF Full Text Request
Related items