Font Size: a A A

Frequent Itemsets Mining Algorithm

Posted on:2012-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:K Y SongFull Text:PDF
GTID:2218330335475766Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining is the data found from the mass of potential, unknown, valuable information, is currently a hot research field of the database. Although data mining is a new subject, but along with the information and technological progress (such as e-commerce, sensor networks and remote sensing data analysis, etc.), data stream came into this particular form of data. Data stream mining is a very challenging job, but also has high research value.In this paper introduces the frequent itemset mining the basic concepts and knowledge, the classical algorithm for mining frequent itemsets were introduced and analyzed the advantages and disadvantages, In the static data and dynamic data stream aspects of mining frequent itemsets have done in-depth research, major research work is divided into the following two aspects:The first, in the frequent itemsets mining, many algorithms are based on Apriori. These algorithms have two common problems, First, A lot of memory space are occupied by the entire database which must be loaded .Second, The processes of generating candidate itemset and computing support spend a lot of time. In order to improve efficiency, propose a BitTable-based form mining frequent itemsets algorithm—Hash-BFI, The database is compressed into the BitTable in accordance with horizontal and vertical direction saving lots of place, use the hash function to compute the frequent two itemsets, also completely utilize AND,OR operation to generate candidate itemset and compute support for candidate itemset, and produced a pruning, All these measures improve the efficiency of algorithm.The second, data stream have characteristics of the flow, continuity, and the unbalanced distribution of item , This paper presents a balanced space-time data stream mining frequent itemsets algorithm---Bala_Tree, The algorithm can only scan data stream once, rapid cluster updates, regular tree reconstruction and based classical algorithm for mining frequent itemsets. Experiments show that the algorithm can quickly scan and update data, the rational use of memory, accurate access to frequent item sets.Data stream mining has some application value. there are two types of mining algorithms at present: one is the entire data stream mining algorithms, but can not guarantee the integrity of mining, there will be some error;another is the stage mining or is the latest arrival data mining, but said the overall pattern information is not be guaranteed. Future changes in the data stream itself will be great, there will be more complex, more traffic data stream, therefore, to explore a kind of global, accurate data stream algorithm is the goal of the researchers.
Keywords/Search Tags:Apriori, Frequent Itemsets, Hash-BFI, Data Stream, Bala_Tree
PDF Full Text Request
Related items