Font Size: a A A

Research On Mining Frequent Episodes In Event Sequences

Posted on:2013-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:P HuangFull Text:PDF
GTID:2248330395451105Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, databases and embedded technology in real-world applications, there are a lot of temporal data data exsiting in business, engineering and scientific fields. Temporal data mining is concerned with finding useful patterns in sequential (and often symbolic) data streams. Frequent episode discovery is apopular framework for mining useful/interesting temporal patterns from sequential data. The framework has been successfully used in many application domains, e.g., network monitoring, intrusion detection, analysis of alarm sequences in telecommunication networks, user-behavior prediction from web interaction logs, relating financial events and stock trends, text mining etc.Most state-of-art frequent episode mining algorithms are window-based and non-overlapped occurrences based. There is no efficient algorithm for mining episodes that is distinct occurrences based. Based on summarizing the related works at home and abroad, this dissertation profoundly carries out discussing the problem of minning frequent episodes that are distinct occurrences based. The main contributions are concluded as follows.1. The algorithm DOFC is proposed to counting distinct occurrences of frequent episodes from an event sequence. Frequent episodes character the behavior of users or systems about the real-life applications. Most of existing mining algorithms calculate the support of an episode based on sliding window or its non-overlapping occurrences, which use finite state automata to calculate the frequency by recognizing occurrences of episodes in the data sequence. However, these algorithms are not efficient, or even incapable, when dealing with the distinct occurrences based frequency counting of an episode which is essentially more effective. This reason is that we may need to maintain multiple automata for an episode to recognizing its all distinct occurrences so that the temporary storage can go unbounded. To solve the problem, we introduce a noval model SCDFA (State Counting Deterministic Finite Automaton), which can merge all automata in the same state by adding state-count for each state. Consequently proposes an efficient algorithm DOFC based on this model. It also gives theoretical analysis about the cost and effectiveness of this algorithm. Experimental results also verify the efficiency and effectiveness of the proposed algorithm.2. The algorithm SCTree is proposed to mine frequent episodes from an event sequence which is distinct occurrences based. Most of existing mining algorithms use an Apriori-style level-wise procedure. Each level involves two steps:a candidate generation step and a frequency counting step. Which results in that these algorithms need to scan the given event sequence more than once and generate a large number of candidate episodes. In addition, Apriori-style level-wise algorithms introduce much repeated calculations:on the one hand, the frequency counting step introduces much repeated calculations due to the correlation among episodes; on the other hand, in the frequency counting step of the (N+1)th level, some frequency counting in the Nth level are repeated. To solve the problem, we introduce a model SCTree (State-Counting prefix Tree) which extends the SCDFA model. A SCTree stores all episodes in a candidate set by their prefix. With SCTree we can reduce much repeated calculations among candidate episodes. To reduce the scanning of database, an eager extension technology of SCTree is proposed. Experimental results show the efficiency and effectiveness of the proposed algorithm.In summary, after discussing the problems of frequent episode mining based on distinct occurrences, this dissertation presents effective solutions to them. Theoretical analysis and experimental results show that our proposed algorithms are promotion of both theoretical study and practical applications.
Keywords/Search Tags:data mining, data stream, event sequences, frequent episode, distinctoccurrences
PDF Full Text Request
Related items