Research On Stream Prediction Based On Episode Rule Matching

Posted on:2012-06-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H S Zhu

Full Text:PDF

GTID:1488303356971259

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the fast development of Internet, database and embedding techniques, together with the accelerated demands of real-world applications, a new kind of data type named as data stream has been broadly applied in many fields such as sensor data processing, network security monitoring, finance & securities managing, and transaction log analyzing. Differnet from the static data stored in a traditional database, data stream which consists of a series of value pairs (event type, timestamp) possesses many characteristics such as rapidness, boundlessness, continuity and variety, which makes it difficult to directly apply the data mining algorithms for traditional database to data stream analysis. As one of the important tasks of data stream analysis, stream prediction has become a research hotspot in the academy and industrial fields because it simultaneously faces tremendous challenges and unprecedented opportunities of applications. Based on summarizing the related works at home and abroad, this dissertation profoundly carries out discussing four key problems involved in stream prediction:frequent episode mining, frequent closed episode mining, non-redundant episode rule extracting and episode rule matching, which forms a systematic research on stream prediction. The main contributions are concluded as follows.1. The algorithm MANEPI is proposed to mine frequent episodes from an event sequence. Frequent episodes character the behavior of users or systems about the real-life applications. Most of existing mining algorithms calculate the support of an episode based on its minimal occurrences or non-overlapping occurrences, which are prone to the issue of over-counting the occurrences of an episode or poorly characterizing the followed-by-closely relationship over event types within an episode. In addition, due to utilizing the same breadth-first search strategy as algorithm Apriori, these algorithms need to scan the given event sequence more than once and generate a large number of candidate episodes. However, our algorithm MANEPI computes the support of an episode based on its minimal and nonoverlapping occurrences. By the depth-first search strategy, MANEPI scans the given sequence only once and does not generate any candidate episode. Moreover, MANEPI uses the Apriori Property of an episode to avoid unnecessary episode growth, which further narrows down the search space of frequent episodes. Both theretical study and experimental evaluation demonstrate that MANEPI has higher mining efficiency and mining quality.2. FCEMiner is proposed to mine frequent closed episodes from an event sequence. The set of frequent closed episodes is an information lossless compression of all frequent episodes. To the best of our knowledge, Clo_episode is currently the only algorithm for mining frequent closed episodes. Although Clo_episode scans the given event sequence only once, it takes the breadth-first search strategy and suffers from a huge number of candidate episodes during the mining process. Additionally, due to using the minimal occurrences to compute the support an episode, Clo_episode also has the issue of over-counting the occurrences of an episode. However, our algorithm FCEMiner takes the same search strategy and support definition as MANEPI to obtain a compact yet complete set of frequent episodes. Moreover, it utilizes the non-closed unanimity of special forward extension to skip redundant closure checking, which further narrows down the search space of frequent closed episodes and speeds up the mining process. Both theretical study and experimental evaluation confirm that FCEMiner is able to efficiently discover frequent closed episodes from an event sequence.3. Extractor is proposed to extract non-redundant episode rules from an event sequence. An episode rule describes a causal relationship between frequent episodes. Existing episode rule extraction algorithms have three main problems:First, the number of sliding windows or minimal occurrences are used to compute the support of an episode, which leads to the lower quality of mined frequent episodes. Second, the number of episode rules generated directly from all frequent episodes is too large and many of which are redundant. Third, some pruning techniques are utilized to filter the reduntant episode rules, which increases the time cost of algorithms. However, our algorithm Extractor discovers frequent closed episodes and their generators by employing the support measure of both minimal and non-overlapping occurrences and the depth-first search strategy, which assures the quality and efficiency of mining frequent closed episodes and their generators. Moreover, Extractor avoids redundant generator checking by utilizing the Apriori Property of non-generators. In addition, Extractor generates non-redundant episode rules directly from frequent closed episodes and their generators, which improves the quality and efficiency of generating episode rules. Theoretical analysis and experimental evaluation demonstrate Extractor can efficiently find all non-redundant episode rules from the given event sequence. 4. Predictor is proposed to predict streaming data based on episdoe rule matching. Studying on the potential regularities of historical streaming data and applying them to predicting future streaming data can provide important decision support for many real-life applications. Most of existing stream prediction algorithms are regression analysis-based or rule match ing-based methods. The regression analysis-based one can predict streaming data with high speed, but it is only suitable for predicting linear data. On the other hand, although the rule matching-based one can predict both linear and nonlinear data, it has problems such as a strict form of a rule, a restricted or obsolete prediction interval, over-matching rules and etc. However, our algorithm Predictor takes non-redundant episode rules as matched rules, which assures the representative meaning and general form of rules to be matched. During the process of prediction, Predictor uses an automaton per episode rule. Aiming at finding the latest minimal and non-overlapping occurrence of all antecedents, Predictor simultaneously tracks the state transtion of each automaton by a single scanning of data stream, which can not only map the boundless streaming data into the finite state space but also avoid over-matching episode rules. In addition, the results of Predictor contain the occurring intervals and occurring probabilities of future episodes. Theoretical analysis and experimental evaluation demonstrate Predictor has higher prediction efficiency and prediction precision.In summary, after discussing the four key problems involved in stream prediction, i.e., frequent episode mining, frequent closed episode mining, non-redundant episode rule extracting and episode rule matching, this dissertation presents effective solutions to them. Theoretical analysis and experimental results show that our proposed algorithms are promotion of both theoretical study and practical applications about stream prediction.

Keywords/Search Tags:

data mining, data stream, frequent episode, frequent closed episode, episode rule, data stream prediction

PDF Full Text Request

Related items

1	Research On Frequent Episode Mining Over Event Streams With Intervals
2	Efficient main memory algorithms for significant interval and frequent episode discovery
3	Research On Mining Frequent Episodes In Event Sequences
4	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
5	Web Anomaly Intrusion Detection Based On Frequent Closed Episode Rules
6	Research On Frequent Pattern Mining Algorithm Oriented To Data Stream
7	Mining Frequent Closed Patterns In Data Streams
8	A Frequent Serial Episode Mining Algorithm With Time Constraints Based On Spark Platform
9	Research Of An Algorithm For Frequent Closed Itemset Mining On Data Stream
10	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application