Font Size: a A A

Query Processing And Optimization Over Various Types Of Streaming Data

Posted on:2009-08-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YanFull Text:PDF
GTID:1118360272958837Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As a new data model, data streams play an important role in many applications, such as network traffic management, financial monitoring, e-business, traffic control, information publich/subscribe, copy right protection, environment monitoring as well as flow management in industry and so on. The query processing and optimizing technologies over data streams have been widely studied. The infinite and high speed characters of data streams and the requirement of fast online response for these applications break many assumptions in traditional databases. Many basic query processing techniques in traditional databases need to be re-examined.Since streaming data are dynamic, a large number of queries are continuously pre-subscribed over data streams. Only the elements which are related to the queries can be processed and stored. Therefore, analyzing the features and building index structures over these continuous queries are very important for query optimization. In the above stream applications, there are various data types. However, most of the investigations of query processing and optimization are based on structured and semi-structured data types. In this paper, we propose algorithms for query processing and optimization over different types of streaming data. We implement the systems and compare with related techniques. Extensive experimental results confirm the efficiency and effectiveness of our proposed techniques. To summarize, our contributions are as follows:1. Efficientκ-NN processing over structured centralized data : Efficiently processing continuousκ-nearest neighbor queries on data streams is important in many application domains. Usually not all valid data objects from the stream can be kept in main memory. Therefore, most existing solutions immediately discard some of the objects and store only representative objects in an index. These solutions are thus approximative. In this paper, we propose an efficient method for exactκ-NN monitoring through indexing the queries rather than the streaming objects, storing the objects that are related to the queries in a skyline data structure, and delaying processing technique. 2. A novel partition-based scheme PMJoin is proposed to further optimize the cost of joins over distributed structured data : In emerging data stream applications, data sources are typically distributed. Evaluating multijoin queries over streams from different sources may incur large communication cost. For continuous queries, the precious bandwidths would be aggressively consumed without careful optimization of the operator ordering and placement. In this paper, we focus on the optimization of continuous multijoin queries over distributed streams. We propose a heuristic algorithm to generate a query plan to minimize the communication cost.3. Continuous copy detection over streaming videos based on streaming algorithms: Digital videos are increasingly adopted in various multimedia applications where they are usually broadcasted or transmitted as video streams. Continuously monitoring copies on the fast and long streaming videos is gaining attention due to its importance in content and rights management. Efficient data stream algorithms are therefore essential for processing a large number of continuous queries on video streams. In this paper, we first define video sequence similarity that is robust with respect to changes of videos, and a hash-based video sketch for efficient computation of sequence similarity. We then present a novel bit vector signature of the sketch to achieve two optimization objectives: CPU cost and memory requirement. Finally, in order to handle multiple continuous queries simultaneously, we design an index structure for the query sequences.4. Querying static and streaming RDF graph data: Efficiently querying RDF [92] data is being an important factor in applying Semantic Web technologies to real-world applications. We propose a new scheme to store, index, and query RDF data in triple stores, which includes two parts: static and streaming RDF query processing. Graph feature of RDF data is taken into considerations which might help reduce the join costs on the vertical database structure. Based on our static optimization algorithm, we further propose the strategies of how to optimize continues RDF queries over streaming RDF triples: (1) Group the continues queries according to the characters of triples. (2) Each query maintains a related triple record ID list. (3) Each query periodically process the query only in related triple set. To this end, only the triples that related to the queries need to be stored which save large spaces and improve the efficiency of query processing.The paper combinations streaming techniques with the characters of different data types in different applications and proposes smart query optimization algorithms and continuous query index structures. It greatly improves the efficiency of query processing on various types of data. It enables the processing of the queries which can not be processed accurately before. In some extreme cases, it can promote 3 to 4 orders of magnitude. These techniques are not only important in the above applications, but can be also extended to processing continuous queries over more types of data.
Keywords/Search Tags:data stream, query processing, query optimization, indexing, continuous query
PDF Full Text Request
Related items