Font Size: a A A

Study Of Non-blocking Join Algorithm Over Data Stream

Posted on:2011-06-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:G ChenFull Text:PDF
GTID:1118360305992008Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data stream is an important research area of database technology in recent years. The remote data passes the network to the local system, then is stored into the data integration system for further processing. Join on data stream is a query supporting technologies which is widely applied. As the data stream environment may far exceed the memory size of the data storage capacity, the whole join results are difficult to get. Another challenge is that the underlying network may incur unpredictable failures, causing delays of tuple transmission. However, the interactive and real-time systems often require accuracy results as well as non-blocking output.Constrained with memory, the system have to recur to the external storage in order to get exact results. When the network suspends, the memory-resident data plays the role of a data source. With the help of the external join, the progressive join can ensure the non-blocking character and can get accurate result set.Using a complete data partition as an external join transactional unit is a common method, but this may result in the difficulty of activating the external join, especially in the middle and latter data stream process. With the goal of solving the above problem, two classic non-blocking join ways are revised. To make the better use of memory and interval time, a novel algorithm based on hash-merge for improving the query response time is proposed, separating one external join transaction into multi-subtasks. The reduced data structure of in-memory tuples helps to improve memory utility. A replacement selection tree is applied to adjust memory by expanding or shrinking the size of tree. In addition, a cost model to estimate task output rate is proposed to select the in-disk portion that promises to produce the fastest results in the external join stage.Spatial data join is different from the equi-join mainly because of the character of the spatial cross-join. We propose a spatial non-blocking join which designs theory models based on the statistical results:an efficient flush policy applied into unreliable network as well as a cost model to determine the external join between memory and disk.Aperiodic data streams often need to determine the expected distribution of character-istics. But how to determine the distribution of the data partition, and how to partition the dynamic data are the key issues. A join algorithm over data stream of the transformation Gaussian distribution is proposed. Statistical sampling is applied to determine the current Gaussian center point and to partition the data blocks. The way to fix the Gaussian distribution block is proposed to determine the predict equation, which results in the efficient flush policy.
Keywords/Search Tags:data stream, non-blocking, join, flush policy
PDF Full Text Request
Related items