Study Of Non-blocking Join Algorithm Over Data Stream

Posted on:2011-06-07

Degree:Doctor

Type:Dissertation

Country:China

Candidate:G Chen

Full Text:PDF

GTID:1118360305992008

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data stream is an important research area of database technology in recent years. The remote data passes the network to the local system, then is stored into the data integration system for further processing. Join on data stream is a query supporting technologies which is widely applied. As the data stream environment may far exceed the memory size of the data storage capacity, the whole join results are difficult to get. Another challenge is that the underlying network may incur unpredictable failures, causing delays of tuple transmission. However, the interactive and real-time systems often require accuracy results as well as non-blocking output.Constrained with memory, the system have to recur to the external storage in order to get exact results. When the network suspends, the memory-resident data plays the role of a data source. With the help of the external join, the progressive join can ensure the non-blocking character and can get accurate result set.Using a complete data partition as an external join transactional unit is a common method, but this may result in the difficulty of activating the external join, especially in the middle and latter data stream process. With the goal of solving the above problem, two classic non-blocking join ways are revised. To make the better use of memory and interval time, a novel algorithm based on hash-merge for improving the query response time is proposed, separating one external join transaction into multi-subtasks. The reduced data structure of in-memory tuples helps to improve memory utility. A replacement selection tree is applied to adjust memory by expanding or shrinking the size of tree. In addition, a cost model to estimate task output rate is proposed to select the in-disk portion that promises to produce the fastest results in the external join stage.Spatial data join is different from the equi-join mainly because of the character of the spatial cross-join. We propose a spatial non-blocking join which designs theory models based on the statistical results:an efficient flush policy applied into unreliable network as well as a cost model to determine the external join between memory and disk.Aperiodic data streams often need to determine the expected distribution of character-istics. But how to determine the distribution of the data partition, and how to partition the dynamic data are the key issues. A join algorithm over data stream of the transformation Gaussian distribution is proposed. Statistical sampling is applied to determine the current Gaussian center point and to partition the data blocks. The way to fix the Gaussian distribution block is proposed to determine the predict equation, which results in the efficient flush policy.

Keywords/Search Tags:

data stream, non-blocking, join, flush policy

PDF Full Text Request

Related items

1	Non-blocking Join Algorithm Research On Datastream Environment
2	Research On Stream Join Algorithm And Parallelization Based On Big Data Platform
3	Result Completeness Guarantee Strategy Studies In Distributed Stream Join Systems
4	The Distinct Element Problem In Equi-join For Multiple Data Streams
5	Big Data Stream Query Framework And Research On Arithmetic Operator
6	Efficient Similarity Join Over Probabilistic Data Streams Based On Earth Mover's Distance
7	Data Stream Query Operator Algorithm
8	Research And Implementation Of Data Stream Management System
9	Distributed Stream Join System Load Balance Strategy Studies
10	Multi-Join Query Algorithm Research Over Data Streams