Font Size: a A A

Research On Several Key Technical Problems Of Transaction Data Stream Processing

Posted on:2013-01-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:X X ZouFull Text:PDF
GTID:1118330374487358Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Modern data analysis applications driven by the network effect are pushing traditional database and data warehousing technologies beyond their limits due to their massively increasing data volumes and demanding for low latency. Data stream continuous query and complex event processing change the methods of store-first-query-later for analysis of timely information while they firstly process data in memory before data are stored in the disk. The methods of processing data in memory improve the real-time performance and reduce the computing resources. But data stream system and complex event processing convert the relational data source into stream table disregard of the transactional characteristics of the relational data; the join semantic of multi data stream is the join semantic of tuple-based sliding window and time-based sliding-window which do not consider the complex semantic and synchronize of multiple data streams in data stream system and complex event processing; and the algorithms of data stream joining with disk-based relation do not take into account the join semantic and memory overhead. Relational database is mainly data source of data stream and complex event processing, It is necessary to enhance the processing of transaction stream in data stream system and complex event processing.The thesis expands the transaction data stream processing of data stream continuous query and complex event processing. The main research works of thesis are as follows:(1) We propose the monotonic theory of transaction data stream which is played a role that the order of tuples sharing the same timestamp will not have an influence on the result of the continuous query over transaction data stream, and the ACID of the relational data can be maintained. The strategy of delay computing and the design of query execution based on memory database have been proposed to achieve the monotonic of transaction data stream. The experiments show that the design of query execution based on memory database can realize the batch-driver of sliding window and share the data of sliding window for multi-queries. The monotonic theory of transaction data stream is verified.(2) We present the join semantic model based on matching window identifies for joining of multi-data stream. Because the unwindowed stream joins will eventually outgrow the memory and the join semantic of tuple-based sliding window and time-based sliding-window are unable to meet different kinds of windows join for example different size and different slide of windows, we make use of window identifies to shield the difference of all kinds of windows which extends the semantic of window join. A sliding window is divided into a number of sub-windows when the newest sub-window fills up it is appended to the sliding window while the oldest sub-window in the sliding window is removed. We propose the corresponding algorithm of window join and maintaining the window. The experiments show that the joining model of window identifies can synchronize multi-data stream.(3) We focused on the algorithm of the join a fast stream with a disk-based relation under the constraint of limited memory. The novel algorithm MeshJoin has been proposed for joining a continuous stream with a disk-based relation. The crux of MESHJOIN is that every iteration whole memory block of disk-based relation is replaced. We propose that the memory block is divided into a number of logical partitions, and then every iteration whole only one logical partition of memory block is replaced. The experimental show that the service rate of the join is increased because I/O cost for one loop iteration is decreased.(4) If the stream joins with multi disk-based relations, we create the materialized view for multi disk-based relations. We considered the maintenance algorithms of the materialized out-join view and the materialized inner-join view. We proposed that the materialized out-join view is rewritten as the join disjunctive normal form, so the secondary delta of materialized outer-join views is computed from the independent computation of terms derived from every normal term. Experimental results show that the algorithm can reduce the computational cost effectively and no restrictions on SPOJ views. We proposed the delay partial compensation algorithm in data source for the materialized inner-join view. Experimental results show that the algorithm avoided the problems o f global time and incorrect maintenance.Data stream continuous and complex event processing are widely used in the internet of things and cloud computing. The research of the thesis contributes to extend the transaction data stream processing which are some theoretical values and some practical values.
Keywords/Search Tags:Transaction data stream, monotonic query, slidingwindows, join computing, increment computing
PDF Full Text Request
Related items