Font Size: a A A

Reseach On K-Skyband Based Top-K Dominating Query Over Distributed Data Streams

Posted on:2014-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y P ZhanFull Text:PDF
GTID:2268330398962916Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Top-K dominating query returns K data objects which dominate the highest number of objects in a dataset. It plays an important role in preference query and multi-criteria decision support because it combines the advantages of top-k and skyline queries without sharing their disadvantages. Current research work on top-k dominating query is limited to centralized data set. Howerver, data from geographically distributed data sources is sent to the central data warehouse in the form of an endless stream has become increasingly ap-parent in some emerging applications, such as analysis in the financial information,sensor networks and so on. How to find the valuable information from such vast amounts of dis-tributed flowing data in real time is the research focus in the field of stream data mining. In this paper,we talk about Top-K dominating query over distributed data streams, and the main research of this paper is as follows:(1)We observe that the result of Top-K dominating query is a subset of K-Skyband query and thus propose to solve the problem of Top-K dominating query over distributed data streams by continuously monitoring K-Skyband in advance.(2)A novel algorithm GBIFA based on regular grid index is proposed to efficiently and continuously maintain K-Skyband over distributed data streams.The idea of delivering the incremental K-Skyband is used by GBIFA to reduce the communication overhead between sites. Furthermore,In order to accelerate the server processing time, the strategy of dominat-ing region partition is used to avoid costly dominating tests during update maintenance.(3)We implement an algorithm GKTDM to continuously monitor Top-K dominating query result over distributed data streams based on maintaining K-skyband in advance. In order to reduce the query processing time, the query result of K-skyband is regarded as the candidate set of Top-K dominating query in GKTDM and only the points in K-skyband are needed to do the costly domination score computation. In addition,a novel idea to retain the domination score of K-Skyband points is put forward to avoid recomputing domination score of most K-Skyband points during the maintenance. Furthermore,due to the maintenance of K-Skyband in central site is unnecessary when a new coming point are not a local K- Skyband point.The strategy that the remote sites pre-determine the identity of new coming points is used by GKTDM to determine whether they are local K-Skyband points or not.(4)In this paper, we use the multi-threaded server/client model to simulate the environ-ment of distributed data streams and implement an experimental platform for queries over distributed data streams. The platform can efficiently and automatically test the performance of the proposed algorithm under different parameters.Our research work on Top-K dominating query over distributed data streams has im-portant significance in user-preference system and multi-criteria decision making system. Nowadays,Data mining over distributed data streams is receiving more and more atten-tion,our research can promote the application of the Top-K dominating query over distribut-ed data streams.
Keywords/Search Tags:Distributed Data Streams, K-Skyband, Top-K Dominating, Grid Index
PDF Full Text Request
Related items