Font Size: a A A

Research On Key Issues Of Data Stream Multi-dimensional Modeling And Querying

Posted on:2011-10-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:D F HouFull Text:PDF
GTID:1118330341951630Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the extension of data stream application in a wide range of fields, the users want to discovery the trends, unusual and interesting patterns from diverse composite dimensions and different granularities for the real time decision making. The research on data stream multidimensional modeling and querying is conducted to meet this requirement. Compared with traditional data, data stream features variability, infinity and bursty; and on the other hand, compared with traditional analysis methods, multidimensional query is highly sophisticated, which presents huge challenges to s data modeling, storage and querying.In response to these challenges, this dissertation aims to address several key problems, including multi-level time window model, multidimensional model of data stream, stream cube computing and multidimensional continuous query. The multi-level time window model bounded the infinity of data stream, and also described the multi-time granularities and variability. The aggregated values of multi-level time window were maintained in the adaptive hierarchy aggregate tree for aggregation computing. The multidimensional model of data stream was defined for organization. The time dimension followed the multi-level time window model which represents the dynamic property of multidimensional model, the basic algebra, analysis algebra and maintenance algebra were defined for lading the theory foundation of multidimensional organizing and querying. The stream cubing method based on interesting view subset was put forward for multidimensional organizing of data stream, in this method, the multi-way aggregation tree is established for maintaining the cells of interesting views. The tree structure can be updated dynamically to meet the multidimensional queries. The framework based on query state maintenance was designed for computing multidimensional continuous queries, and the index structure is built for improving the efficiency of queries execution. Finally, the application of multidimensional query is illuminated by the case of weblog analysis.The main contributions of the dissertation are as follows:(1) Multi-level time window model was put forward for mapping the infinite data stream to the sliding window, and the relations of different level were described by time granularities system, which can fill the multi-time granularities of queries and provide the foundational support for data stream processing. The adaptive method for computing the aggregation in time window was studied, in this method, the adaptive hierarchy aggregate tree is adopted as the basic structure, in which the sparse parts only the high level value is held. The experiment shows that the method is superior than others in the bursty data stream.(2) The multidimensional model of data stream is proposed in the dissertation, the time dimension was described by multi-level time window model, the infinity of time dimension is restricted and the multi-granularities is expressed, and the dynamic of model is depict by the evolving of stream fact. The algebras were described for defining multidimensional queries, including the basic algebra, analysis algebra and maintenance algebra. In the end, we aimed at the infinity of time dimension, dynamic of stream fact and sophistication of aggregate function analyzed the scope and restrictions of model. The definition of multidimensional model of data stream and algebra despite the dynamic of multidimensional computing, and lay the theory foundation for data stream organizing and querying.(3) The stream cube computing method based on Interesting View Subset was proposed for dynamic multidimensional organizing of data stream. The interesting view set indicates the requirements of queries, and cover only small parts of all ones. Materializing the cells of interesting views could reduce the consumed memories and also fill the needs of most users. In this method, an multi-way aggregate tree is adopt for maintaining the cells and it's relations, it can be used for quickly updating the cube and the result of ad-hoc querying, in the running phase, the storage space of structure can be reduced by multi-level time window and adaptive partition strategy. The experiments show that the method could satisfy users' requirements and also is efficient in time and space.(4) The framework of query computing based on query state maintenance was proposed for multidimensional continuous querying. The state of continuous query hold the cells that may contribute to any future query results, and support dynamically computing of multidimensional continuous query by the operator of update, remove and generate results. We also constructed the index tree based on the select predication of continuous queries for improving the update efficiency of state. The experiment shows that the method was effective in multidimensional continuous query implementation, and also was efficient in time and space.In conclusion, this dissertation put emphasis on several key issues of data stream multidimensional modeling and querying, and a series of algorithms and theories were studied. It is significant in theory and practice for the development of data stream application.
Keywords/Search Tags:data stream, aggregation computing, multidimensional data stream model, stream cube, interesting view, multidimensional continuous query, weblog analysis
PDF Full Text Request
Related items