Font Size: a A A

An Algorithm For Mining Frequent Itemsets From Data Streams

Posted on:2013-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiFull Text:PDF
GTID:2268330395979884Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology,information storage technology and internet,more and more enterprises had paid their attention to the improvement of informatization degree.Data mining,as a new crossed field of multiple applied subjects,has been frequently brought into enterprise application,and it has played a more and more important role in the decision-making activities of every walk of life.Data Mining, also known as knowledge discovery from the Databases,referred to as KDD, is a uncommon process of identifying effective new potentially useful and finally accessible patterns from mass data,which is one of the hot issues of the field of database research.This paper mainly introduces the basic concepts and knowledge of the data stream for mining frequent itemsets,the classical frequent itemset mining algorithms and its excellent analysis of shortcomings.The main reserch work includes the following two aspects:One the one hand, we have proposed a meta itemsets mining algorithm based on the WCF-tree weighted sliding window (TWEM). Firstly, considering the importance of data in different time windows, the proposed algorithm allows the user to specify the number of windows for mining, and the weight for each window. Secondly, we use the WCF-tree to mine closed itemsets. Finally, to mine the meta itemsets, based on the completely different and estimatable support of itemsets from all equivalent class and the corresponding meta itemsets. The experimental results show that the TWEM algorithm reduces the search space and improve the operating efficiency of the program.On the other hand, we have also proposed a new method,MFP,for prediction frequent patterns over data streams. MFP algorithm can predict those frequent itemsets that have high potential to become frequent in the subsequent time windows, to meet users’needs. Firstly, the algorithm converts the data to0-1matrix. Then it will update the matrix by tailoring it and bit operations, from which mine frequent itemsets as well. Finally, it will predict possible frequent itemsets that may appear in the next time window by using the current data. Experimental results show that MFP algorithm can predict the frequent itemsets in different experimental conditions, therefore, the algorithm is feasible.The continuous growth of information and the more and more frequent application of data mining have posed new challenges on the frequent itemsets mining technology.The further study should focus on improving the operating efficiency of the algorithm based on the change of storage space and execution time by considering characteristies of the data itself.
Keywords/Search Tags:Frequent itemset, Closed itemset, Weighted sliding window
PDF Full Text Request
Related items