Font Size: a A A

Research On Mining Frequent Itemsets Over Data Stream

Posted on:2014-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z D HuFull Text:PDF
GTID:2268330425967350Subject:Curriculum and pedagogy
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology,information storage technology andinternet,more and more enterprises had paid their attention to the improvement ofinformatization degree.Data mining,as a new crossed field of multiple applied subjects,hasbeen frequently brought into enterprise application,and it has played a more and moreimportant role in the decision-making activities of every walk of life.Data Mining,also known as knowledge discovery from the Databases,referred to asKDD, is a uncommon process of identifying effective new potentially useful and finallyaccessible patterns from mass data,which is one of hot issues of the field of database research.This paper mainly introduces the basic concepts and knowledge of the data stream formining frequent itemsets,the classical frequent itemset mining algorithms and its excellentanalysis of shortcomings.The main research work includes the following two aspects:On the one hand, we have proposed a meta itemsets mining algorithm based on theWCF-tree weighted sliding window(TWEM). Firstly, considering the importance of data indifferent time windows, the proposed algorithm allows the user to specify the number ofwindows for mining, and the weight for each window. Secondly, we use the WCF-tree tomine closed itemsets. Finally, to mine the meta itemsets, based on the completely differentand estimatable support of itemsets from all equivalent class and the corresponding metaitemsets. The experimental results show that the TWEM algorithm reduces the search spaceand improve the operating efficiency of the program.On the other hand,we have also proposed a new method,MFP,for predicting frequentpatterns over data streams. MFP algorithm can predict those frequent itemsets that have highpotential to become frequent in the subsequent time windows, to meet users’ needs. Firstly,the algorithm converts the data to0-1matrix. Then it will update the matrix by tailoring it andbit operations, from which mine frequent itemsets as well. Finally, it will predict possiblefrequent itemsets that may appear in the next time window by using the current data.Experimental results show that MFP algorithm can predict the frequent itemsets in differentexperimental conditions, therefore, the algorithm is feasible.The continuous growth of the amount of information and data mining technologyapplications continue to increase for frequent itemsets mining technology development has raised new challenges in the future work of the algorithm to improve storage space andexecution time according to the characteristics of the data stream storage space and executiontime according to the characteristics of the data stream itselfoperating efficiency.
Keywords/Search Tags:Frequent itemset, Closed itemset, Matrix, Data stream, Weighted sliding window
PDF Full Text Request
Related items