Font Size: a A A

Research On Mining Closed Episodes In Multiple Sequences

Posted on:2015-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:X T YangFull Text:PDF
GTID:2298330467959929Subject:Computer science and technology
Abstract/Summary:PDF Full Text Request
Episode mining is an important branch of sequential pattern mining, which goal is to find the subsequences with internal relation meeting certain time constraints for more rational analysis, and provide valuable information for practical applications. In some real applications, the data often comes from multiple sequences which are produced by multiple users or multiple devices. Mining frequent episodes and common episodes in multiple sequences set has important application value.Firstly, we analyze related researches about sequence pattern mining, episode mining and string matching algorithms, and figure out that the existing episode mining methods mainly aim at the single sequences. Most of the methods depend on window techniques excessively, and have higher iteration overhead. Moreover, it is still lack of an effective parallel processing framework for proceesing multiple sequences.Aiming at the multiple sequences consisting of simple events, we design a closed frequent serial episodes mining method based on the maximal duration continuous episodes-HH-KMP, which is a non-redundant algorithm. HH-KMP first extracts the2-neighborhood frequent episodes, and creates the maximal duration continuous episodes based on them. In order to match the maximal duration continuous episodes from the sequence, HH-KMP is extended based on KMP algorithm. On the one hand, when the longest candidate episode doesn’t match with the main sequence, it doesn’t backtrack the matching pointer of the maximal duration continuous episodes; on the other hand, when the matching pointer of main sequence slides, it needs to determine whether the occurrence time of the event subsequence meets the given time constraints. If it doesn’t hold, the matching pointer of the main sequence stops sliding. By counting the matching number of episodes and subepisodes, we can find the closed frequent episode set according to the closed rule. The HH-KMP algorithm uses the maximal duration continuous episodes to match the event subsequences. Once the maximal duration continuous episodes are identified as the closed frequent episodes, it is not necessary to consider their internal closed subepisodes, to avoid a lot of overhead of the extension of traditional subepisode iteration, which can improve greatly the efficiency of the algorithm and reduces storage space. Experiments show that the HH-KMP method has good performance in episode numbers and running time to mine the serial episodes.Aiming at multiple sequences composing of composite events, we put forward a kind of common closed episode parallel mining strategy based on dominant points. Firstly, considering the features of the continuity and parallelism of the composite episodes, the method stores all public parallel events into the common parallel itemsets, to transform composite episode mining into serial episode mining. Then, for processing more complex event sequences synchronously, we regard the same common event in multiple sequences as a dominant point for sequences denoising and refining. Thirdly, according to the multilayer dominate points, a list of the common episode dominant points is formed by the successively relationship. Finally, we traverse the dominant points list in order with the constraint of the time threshold to dig out the common frequent episodes set, then obtain the common closed frequent episodes by closure operations. Through the experimental analysis both on simulated data and real data, the results verify that the suggested method has better performance.
Keywords/Search Tags:episode mining, multiple sequences, closed frequent episodes, commonclosed frequent episodes
PDF Full Text Request
Related items