Font Size: a A A

Research Of Frequent Itemsets Mining Algorithms Based On Bit Vectors In Data Streams

Posted on:2013-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2218330362463062Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Frequent itemsets mining in data streams has become an important topic in thefield of data mining and has a wide range of applications, such as in networkmonitoring, Web servers, telecommunications call records and other fields. Mass datastorage and fast mining efficiency caused by data stream led the traditional staticdatabase mining techniques to be invalid. Therefore, how to make highly compressedstorage and efficient mining is a research hotspot. According to different datarequirements, the scalability of the algorithm should also be considered, if the existingmining results can be made full used, mining efficiency can be great improved. Thispaper introduces bit vector technology to frequent itemsets mining in data streams tosolve the problems.First, an algorithm based on equal-length bit table for mining frequent itemsets indata stream, it converts all the transactions into bit vectors, and bit vectors with thesame tail item are put into the same groups as a bit table, bit vectors has the samelength in the same bit table and has different length in the different tables. Thisstructure can directly mine the frequent1-itemsets and frequent2-itemsets, also itgreatly reduces the number of candidate itemsets, thus improvs the mining efficiency.Second, this paper proposes an algorithm based on prime-block encoding formining frequent itemsets in data streams. It uses bit vector to compress the database,also introduces prime-block encoding to further compress the storage structure, andmines the frequent itemsets according to uniqueness of prime encoding. TheAnd-operation between the bits is converted to the division operation between integers.The algorithm also has some scalability, it can be excavated to mine maximal frequentitemsets and closed frequent itemsets based on the mined frequent itemsets.Finally, an algorithm based on directed bit graph for mining frequent itemsets indata stream is proposed. It uses bit vector to vertically express the transaction database.The bit vector makes up the nodes of the directed bit graph, and on the edges betweenthe adjacent two nodes, the cooccurrence times are recorded for the two items represented by the two nodes. The algorithm can directly mine the frequent2-itemsets,and put them into different groups according to thieir start item. It does incrementalitemset mining in each group without candidate itemsets generation. The algorithmcan also support the frequent itemset mining based on time constraints.The algorithm used in this article is written in C++, experiment and theoryrespectively indicate that this algorithm reduces the mining time and storage space,and the algorithms also has good scalability.
Keywords/Search Tags:Data stream, Frequent itemsets, Incremental mining, Bit vector, Primeblock coding, directed bit graph
PDF Full Text Request
Related items