Font Size: a A A

Generation And Optmization Of The Physical Query Plan In Distributed Data Stream Management System

Posted on:2008-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z FangFull Text:PDF
GTID:2178360272969219Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years an important class of data-intensive applications has emerged. Examples of such applications include network traffic monitoring, telecommunications data management, sensor networks, and others.These applications require continuous queries deployed over multiple geographically distributed high-volume data-streams to be processed in real time.Such data are often real-time, ordered and unlimited,bearing the characters of a"stream", and can not be effciently handled by traditional relational database systems, which are optimized for disc data processing. Most stream data are distributed by nature and challenge the centralized processing model in terms of data exchange , high availability and processing capacity. As a result , it is most advisable to design a general purpose data stream management system as a distributed system. Thus was Distributed Data Stream Management System(DDSMS) proposed to meet the demand.The query process of DDSMS is a key technology. The query process of DDSMS includes query pre-analyse, building the logic plan, building the physical plan,query optimization and query execution. It is an important step in query process to create the physical plan according to the logical plan,which is one of the keys to ensure the real-time request.It needs the specifically transform rules, the appropriate data structure, the implementation of all kinds of physical operator and so on.Relational query optimizers have traditionally relied upon table cardinalities when estimating the cost of the query plans they consider. While this approach has been and continues to be successful, the advent of the Internet and the need to execute queries over streaming sources requires a different approach, since for streaming inputs the cardinality may not be known or may not even be knowable (as is the case for an unbounded stream.)A DDSMS system executes a large number of continuous queries in parallel.The initialization of query plan require the integration of query optimization and operator deployment.As stream characteristics and query workload change over time,the plan initially installed for a continuous query may become inefficient.As a consequence,the query optimizer will re-optimize this plan based on the current statistics at runtime.The topic of this thesis is the generation and optimization of physical query plan in a distributed data stream management system.We first give implementation of some kinds of physical operator,and introduce the generation of physical query plan.Then we develop a query optimization framework, and present a formal definition of query optimization problem,cost model and some optimization strategy in such system.
Keywords/Search Tags:Data Stream Management System, Physical Query Plan, Query Optimization, Cost Model, Data Stream, Operator
PDF Full Text Request
Related items