Font Size: a A A

Research Of Sequential Pattern Algorithm Over Data Streams Based On Prefix Sequence Tree

Posted on:2014-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:G W HanFull Text:PDF
GTID:2268330422466895Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sequential pattern mining over data streams is to find frequent sub sequences instream data. The data streams are continuous, fast, unrestraint, which makes that thesequential pattern algorithms based on the static databases can’t be used directlybecause of its high memory usage, time cost and its features. Moreover, the drawbacksin the consideration of pattern pruning and pattern interest exist in many sequentialpattern algorithms over data streams.Firstly, a sequential pattern algorithm named CSPMDS-PrefixSpan based on thesliding windows over data streams is proposed. PFPS-Tree is defined as summary datastructures to store the frequent sequential patterns effectively. And a table namedPatternTable is designed. Using PatternTable, it updates the potential frequent patternsin the PFPS-Tree. And we prune the PFPS-Tree by the proposed closed detectionpruning and window support threshold pruning to get an FCPS-Tree finally. Thealgorithm extends the sequential algorithm PrefixSpan based on static database.Secondly, an incremental dual-weighted sequential pattern algorithm over datastreams named DWCSPMDS is proposed. An item weight formula and a time weightformula are designed, which can calculate item weight of sequence by items in it andspecify the time weight of sequence. A prefix sequence tree with weight namedWFPS-Tree and a pattern table with weight named WPatternTable are designed basedon CSPMDS-PrefixSpan algorithm. The mining results will be updated by incrementalmethods. The algorithm DWCSPMDS considers the different importance of differentitems and their occurring time.Thirdly, an incremental dual weighted sequential pattern algorithm over datastreams based on pattern decay named WCSPMPD-Stream is proposed. The internalstructures of prefix sequence tree and the method to update the time weight areimproved based on the DWCSPMDS algorithm. Two methods namedTimeWeightDecay and NoUpdatingDecay are defined to decay the patterns, updatingthe mining results. Last, the algorithms proposed in this dissertation will be used in the analysis ofsoftware vulnerability based on the method of sequential pattern match. Thisapplication can judge whether the object software has the vulnerability by theapproach of the detected pattern feature matching with the vulnerability knowledgebase.The experiments are conducted on the platform of NetBeans, using java language.The executing time, memory usage, scalability and the number of patterns areconsidered to prove the better aspects of the proposed algorithms.
Keywords/Search Tags:data stream, sequence pattern, prefix sequence tree, dual weight, featurematch
PDF Full Text Request
Related items