Font Size: a A A

Research Of Pattern Discovery Algorithm Over Data Streams Based On Directed Graph

Posted on:2015-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:W J YangFull Text:PDF
GTID:2298330422470761Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Sequential pattern mining over data stream is an important branch in data mining,is to find sub sequences in multiple data stream. The sequential pattern algorithmsbased on the static databases with high memory usage.Not adapt to the data stream andfeatures of low execution efficience, The Sequential pattern algorithms can’t be useddirectly, the data streams are continuous, fast, unrestraint. The drawbacks in theconsideration of pattern pruning and pattern interest are the main problem in manysequential pattern algorithms over data streams.Firstly, the incremental mining algorithm named IMIGM of frequent itemsetsbased on directed graph over online multiple streams is proposed. The algorithms mineonline multiple streams with only one scan by incremental method. A graph namedDS-graph is the only defined data structure to store the arrival item sets with a randomitem order. When the arrival transactions reach to the capacity of the window, thetop-k frequent item sets will be outputted by traversal of the DS-graph. Then DS-graphwill be updated by incremental method of deleting one old transaction and adding toone new transaction. After this, the mining processes will repeat continuously.Secondly, the incremental mining algorithm named IMSPGM of frequentSequence patterns based on directed graph over online multiple streams is proposed.Inorder to study sequence patterns over multiple streams, is divided into differentfragments according to the same time. The window can have multiple fragments,itslide a fragment.The algorithm mine online multiple streams with only one scan byincremental method,and it set fragment support threshold min_sup and windowssupport threshold MIN_SUP. A graph named DS-graph is the only defined datastructure to store the frequent sequence patterns. When the arrival transactions reach tothe capacity of the window, the frequent sequences will be output by traversal of theDS-graph. Then DS-graph will be updated by incremental method of deleting one oldfragment and adding to one new fragment.Finally, in the detection of software security, we applied the algorithm with a real instance in the real world applications.The experiments are conducted on the platform of Myeclipse, using java language.The executing time, memory usage, scalability and the number of patters areconsidered to prove the better aspects of the proposed algorithms.
Keywords/Search Tags:data stream, sequence pattern, DS-graph, sliding window, itemset
PDF Full Text Request
Related items