Font Size: a A A

Research On Aggregation Query And Mining Frequent Patterns In Data Streams

Posted on:2007-01-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J LiuFull Text:PDF
GTID:1118360212465589Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The appearance of data stream models is bringing great challenges to traditional data management technology. Because of features of fluidity and infinity, the existing database management techniques can hardly be applied to process data streams effectively. Therefore, it's necessary to study the new data stream management techniques which have arrested a lot of database researchers'attention. Data stream processing is becoming a hotspot research issue. To study data streams has not only important scientific value but also huge application outlooks, such as sensor networks, weather monitoring and analysis, mobile object tracking, stock analysis, mail filtering, network monitoring and security, and so on. In this paper, we explore a few principal problems over data streams management system and data mining, in depth the main contents are as following:(1) System Architecture of Data Stream Management: facing to the data streams with high rate, a new data stream management system architecture based on hardware preprocessing is proposed. In existing data stream prototypes system, query optimization and system scheduling are applied to improve data processing rate, but the insufficiency is existed with those methods while in the environment of high rate data streams. So, in this paper, the data stream management system of a new generation is constructed with a completely new angle, and some novel ideas of hardware/software co-design and hardware preprocessing techniques are adopted, thus, the data processing with high rate can be realized.(2) Aggregation query over high rate data streams: basically, most existing aggregation algorithms adopt the approximation technology to obtain the advancement of rate by sacrificing the accuracy. With the rapid development and depreciation of hardware, technology of hardware-software co-design has recently gained the increasing attention. In this paper, we propose a kind of novel aggregate query algorithm based on hardware-software co-design, which incorporate with the advantage of hardware in processing rate and advantage of software in agility.(3) Incremental aggregation query over distributed data streams: Distributed processing will be an inevitable trend of the development of data stream management system, . But in distributed system, traffic is usually the bottleneck resource, therefore, we study and propose a new method of incremental aggregation for distributed data traffic, by this method, the network traffic can be reduced prominently.(4) Frequent closed patterns over data streams: The set of frequent closed patterns determines exactly the complete set of all frequent patterns and is usually a few levels smaller than the frequent patterns. Moreover, frequent closed patterns are much easier to be understood and to be applied. But how to mine frequent closed patterns over data streams is a very big challenge and few study reports on it can be found currently. Sliding window and landmark window are two most important windows in data streams. In this paper, two novel algorithms which for mining of data stream frequent pattern and which are based on sliding and landmark windows are suggested and studied respectively. Such algorithms have good adaptability and extensibility. furthermore, according the need of users, the balance between precision and efficiency can be obtained through adjusting permissible error.(5) Detection of change over data streams: The changes of data models often provide more information than normal data models, consequently, the detection over data changes is one of the most important core issues of data streams mining. In this paper, we present a novel method for the detection and estimation of change based on maximum frequent itemsets information entropy, this method can reflect effectively not only the changes of the data stream models of mining associate rules but also the changes of data sets.
Keywords/Search Tags:data stream, system architecture, aggregation query, hardware/software co-design, distributed system, frequent closed item-sets, detection of change
PDF Full Text Request
Related items