Font Size: a A A

Research On Itemset Sequential Patterns Mining Algorithm Based On Memory Indexing&Bitmap

Posted on:2014-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y P ChenFull Text:PDF
GTID:2268330422966887Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Sequential pattern mining can help user to find the knowledge from mass information,such as customer transaction sequence, stock quotation and biological informationsequence. Improving the efficiency of mining frequent sequential pattern and memoryutilization are hot issues of data mining, because it is the key step of sequential patternmining. However, sequential pattern mining applied in static database and dynamic datastream environment is the hot topic now, it will be studied in two aspects of application inthis paper.Firstly, we present an algorithm MEMIGCSP, which is based on memory indexing.The length of the interval between items is limited in the original sequences so as to findinterested sequences. The gap constrained prefix index set (P-Gidx) based on memoryindexing provides the position information of sequence patterns. Pattern growth isadvanced to extend the prefix pattern after the local extension items are searched from theforward and backward space, thus the complete sets of sequential pattern can be found.Strategy of divide-and-conquer is used to deal with the sequences which are in P-Gidx andothers respectively throughout the entire algorithm, it avoids frequent items omission.Secondly, we present an algorithm IMASP to mine across-stream sequential patternsfrom multiple item set streams. The algorithm adopts multi-streams bitmap temporarytable to store the sequence information in the current sliding windows. The partialsubstitution and real-time counting of the item in old and new windows are adopted toimprove the execution efficiency of the algorithm. A multi-streams LexicographicSequence Tree is used to store items and extended sequences as well as the support. Whenthe transactions arrive, across-stream sequential patterns will be output quickly andaccurately by this algorithm.Finally, two algorithms mentioned above are implemented in java. The experimentaldata and environment are introduced. The key technique about algorithms is analyzed indetail. What’s more, the results and analysis are given. Experimental results show that mining time and scalability of algorithms presented inthis paper are more effective in solving respective problem. The time cost is less than thecurrent analogous algorithms, and the anticipated results are realized.
Keywords/Search Tags:Sequential patterns mining, multiple data streams, memory indexing, bit map, sliding window, incremental mining
PDF Full Text Request
Related items