Font Size: a A A

Research On Array-Based Algorithm Of Minging Frequent Patterns

Posted on:2013-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:J D YuanFull Text:PDF
GTID:2248330371477797Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Frequent pattern refers to patterns which appear frequently in dataset.Frequent pattern mining has been a focused theme in data mining research for over a decade.In general, we feel that as a young research field in data mining, frequent pattern mining has achieved tremendous progress and claimed a good set of applications.According to the type of pattern, frequent pattern mining can be classified to frequent itemsets mining, sequential pattern mining and structural pattern mining. Frequent itemsets mining refers to mining frequent itemsets from the transaction or relationship data set. Sequential pattern mining tends to search for frequent sub-sequences from sequence data set; sequences record the sequence of events. The structural pattern mining focus on the search for frequent sub-structure in structured data, it can be seen as the most general form of frequent pattern mining.This paper mainly studies frequent sequence mining algorithms, and focus on the pattern growth-based web access pattern mining, a novel array-based technology is proposed to improve the efficiency of sequential pattern mining algorithm.Firstly, the basic content of frequent pattern mining are described in detail in this paper, and according to the type of pattern mining, we summarize the main algorithms for frequent itemset mining, sequential pattern mining and the mining of frequent substructures,and emphasis on the apriori technology and pattern growth technology that frequent itemset mining and frequent sequence mining depend on.we also introduce the applications of frequent pattern mining in classification and Web mining.Secondly, the array-based, projected database and effective use of the prefix tree technique sequential pattern mining algorithm, WAP-mine*, is proposed in this paper. It uses a novel data structure named W-matrix to store the sequence number. W-matrix has two advantages.On the one hand, it can be used to reduce the scanning times of the WAP-tree, which enhance the efficiency of the algorithm;On the other hand, we can use the W-matrix to prune condition sequences, which reduce the number of conditional sequence base of the WAP-mine algorithm and the memory usage when the algorithm runs.Finally, the installation and use of artificial data synthesis tool, named IBM Quest Synthetic Data Generator, which is usually needed in the research experiment of frequent itemsets and frequent sequence mining, is illustrated in detail. The data format of its synthetic dataset is also explained. Our performance studies on artificial data sets and real data sets shows that WAP-mine*algorithm outperforms the WAP-mine algorithm when the data is sparse. In terms of memory usage, regardless of the data set is sparse or dense, WAP-mine*performed better than the WAP-mine algorithm, but also more favorable when the data is sparse.
Keywords/Search Tags:Frequent Pattern, Sequential Pattern, Array, WAP-tree, W-matrix
PDF Full Text Request
Related items