Research On Top-k Skyline Query Algorithm For Large Dataset

Posted on:2020-01-07

Degree:Master

Type:Thesis

Country:China

Candidate:L Wei

Full Text:PDF

GTID:2428330572479105

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Skyline query algorithm has attracted a lot of attention since proposed,which can describe the general situation of data.Thus,Skyline query algorithm is applied in many practical scenarios,such as multi-rule decision-making,real-time online services and business data analysis.With the rapid development of Internet technology,Skyline query algorithm still has great potentials.However,when the datasets are large and have many dimensions,Skyline query algorithm will fetch back too many results to users.Therefore,the Top-k Skyline query algorithm is proposed,which uses the classical Top-k algorithm to help filter the results of Skyline query.However,the state-of-the-art Top-k Skyline query algorithms still are ine:fficiency for large datasets.First,the algorithms need to create and maintain a special data structure.When the dataset is large,the cost of the construction and maintenance for the special data structures will be very expensive.Second,the time cost of the algorithm is large.The best time complexity of the state-of-the-art algorithms is O(n log n).When the datasets are large,the runtime of the algorithm will be too long.Third,the special data structure of the algorithm leads to a lack of universality.For the datasets with simple data types,building a matching data structure can reduce the time cost of the query algorithm.However,in the face of the datasets with complex data types,it is difficult to find a suitable data structure.Actually,the data types of the datasets are often complex.Here an efficient Top-k Skyline query method called DFTS is proposed,which can perform well for large datasets.DFTS involves three steps.Firstly,the "degree score" function is used to rank the dataset,and a large quantity of objects with low ranking will be filtered out.Secondly,DFTS makes a Skyline query upon the candidates and generates a Skyline subset.Finally,top-k objects with high ranking will be selected from the Skyline subset as the final result.Through these steps,DFTS can significantly reduce the time cost.Moreover,without using a special data structure,DFTS can reduce space cost and be more universal.Then,DFTS is executed on the cluster distributed computing framework,called Spark,that indicate DFTS can well applies to distributed environment.And extensive experimental results show that DFTS can achieve much better performance for large datasets than those state-of-the-art methods.

Keywords/Search Tags:

Skyline, Top-k, Apache Spark

PDF Full Text Request

Related items

1	OCTWAS - Online Check-pointer for Workflows on Apache Spark
2	Design And Implementation Of A Performance Modeling System On Apache Spark
3	Research On Taxi Trajectory Organization Method Based On Apache Spark
4	Research On Skyline Query Processing Techniques
5	Enhanced Singular Collaborative Filtering Based Recommender System On Apache Spark
6	Using apache spark for scalable gene sequence analysis
7	The Research And Implementation Of Movie Recommendation System Based On Flink
8	Research On Apache Spark Distributed Parallel Computing Framework Optimization Technology
9	Study On Skyline Query Processing Techniques In Wireless Sensor Networks
10	Research On K-Prototypes Algorithm Based On Mixed Data And Implementation Of Spark Platform