Font Size: a A A

Research On Distributed Parallel N-of-N Skyline Query Processing Technology Over Data Streams

Posted on:2016-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:W WeiFull Text:PDF
GTID:2348330536967554Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a new form of data,data streams are widely appeared in many application scenarios such as finance analysis,sensor networks,LBS.Efficient query over data streams has become a hot topics among the database circle in the environment of big data.With the widespread popularity of distributed computing environment,distributed parallelization has become an important development trend in the research field of current data streams.Distributed parallelization of stream processing can not only meet the requirements of users' real-time queries,but also can overcome the problem of computing power shortage.n-of-N Skyline query processing over data streams is a new type of Skyline query processing over data streams,and it is more complex.The existing centralized query processing method based on the single-machine environment cannot meet the requirement for query efficiency and flexibility.Therefore,it is realistically valuable for the study of the distributed parallel n-of-N Skyline query.To address the problem of the existing centralized query processing method cannot meet the requirement for query efficiency and flexibility,a new distributed parallel Model named nPNM is proposed,the nPNM Model is appling to n-of-N Skyline query processing.Then a distributed parallel n-of-N Skyline query processing algorithm named PnNS based on the nPNM Model is also proposed.In PnNS every parallel node is responsible for local sliding window,and the parallel nodes can achieve results without communications,results output is achieved in the next-level node.Experimental results show that PnNS is far more efficient than the existing centralized query processing method when the load is large,and it can keep good performance when the size of window and data dimension change.To address the problem of load imbalance appearing in the distributed parallel n-of-N Skyline query processing,a load balancing algorithm named LBA based on the adjustment of sliding window is proposed.The LBA adopts a window partitioning strategy based bucket,and it can adjust the size of local sliding window to balance the load among the parallel nodes.Experimental results show that the load of parallel nodes have become more balanced with the LBA algorithm.And LBA algorithm can make the load of parallel nodes more balanced when the the size of sliding window,data dimension and the number of parallel nodes change.To address the problem of over-provisioning and under-provisioning appearing in the distributed parallel n-of-N Skyline query processing,an elastic and scalable Model EPM and an algorithm named ENPA based on EPM are proposed.The elastic protocol applied in ENPA can make decisions to scale in or scale out with consideration of some key properties of data streams.Experimental results show that ENPA can adjust the number of parallel nodes in order to match the real-time workload and ensure the high performance of the system.
Keywords/Search Tags:Data Streams, n-of-N Skyline, Parallel Query, Load Balance, Elastic and Scalable
PDF Full Text Request
Related items