Font Size: a A A

User Preference Query Processing Over Data Stream

Posted on:2014-07-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:1108330479479556Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently, the problem of processing different types of continuous queries over data streams becomes one of the hottest topics in database research field due to the development of sensor network and real-time monitoring technology. The typical stream data includes the data gathered in network monitoring, sensor network, the stock prices, telecommunication records, or logs from network security monitoring system. Different from the traditional database, data stream is huge and arrive fast. Due to the storage limitation, they can not be stored. Most algorithms over data stream are one-pass. Since the algorithms of traditional database can only process the data stored in disk or memory, the space complexity and time complexity are overwelming for data stream processing. The problem of how to use the limited computational resources to process data stream is a challenge.Due to the limitation of storage and computational capabilities under data stream circumstances, it is difficult to store every tuple of the data stream. Top-k query and Skyline query are the most populate queries of the database reseach field since they can help people retrieve the most valuable information from a huge data set. Top-k query chooses the k tuples with the most highest scores according to a preference function f, while Skyline query selects the those tuples that are not dominated by the others. In this paper,we call top-k query and Skyline query rank-aware queries Since both of their result sets contain rank information.We focus on rank-aware queries, i.e. top-k and Skyline queries. The main contributions are concluded as follows:1. Constrained Skyline query over data stream. A skyline query which is attached with constraints on specific dimensions is called a Constrained Skyline query. Constrained Skyline query is a variant of regular skyline query that retrieves skyline tuples meeting a user-defined constraints set. Constrained Skyline query defines constraints on each dimension according to the interests of users, and the skyline set is computed from the hyperrectangle defined by the constraints. The meaning of the Constrained Skyline query is that it can meet more detailed demands of people, such as hotels that only cost $70 to $90 a day. We focus on Constrained Skyline query processing over data stream and put forward two algorithms for computation and maintanence of skyline sets.2. Dynamic Skyline query over data stream. Different from Constrained Skyline, Dynamic Skyline computes the skyline tuples that closest to some data point p. We employ a grid based index to store the tuples. By making use of the advantage of grid index, we define Influence Area for every query to minimize the cells that need to be processede. We put forward a GBDS algorithm to compute and maintain the skyline sets.3. Continuous Skyline query over distributed data stream. Data stream is distributed in nature. The computation efficiency of each node and the communication cost between central node and sub-node are the challenges of process continuous skyline query over distributed data stream. We put forward an effective skyline processing technique over distributed data stream. By sending k most influential Skyline tuples to the sub-node, the communication overhead is heavily reduced.4. Resource sharing technique among multiple top-k queries over data stream. We put forward a resource sharing strategy RS-Tpk, which makes use of the features of top-k query. By sending the minimum tuples in current slice to the next slice, unnecessary computing is avoided and the CPU time is decreased.5. Continuous top-k processing over data stream. We propose an effective pruning technique, which minimizes the number of tuples that need to be stored and manipulated. Based on it, a cost-efficient method for continuous top-k processing over single data stream is proposed, and the computation complex and memory requirements are greatly decreased. The data structure we use is able to support preference function whether it is or not monotonic and the running time is hardly effected by dimensions.
Keywords/Search Tags:data stream, rank-aware, top-k, Skyline, Dynamic Skyline
PDF Full Text Request
Related items