Frequent Itemsets Mining Algorithm

Posted on:2012-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:K Y Song

Full Text:PDF

GTID:2218330335475766

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data mining is the data found from the mass of potential, unknown, valuable information, is currently a hot research field of the database. Although data mining is a new subject, but along with the information and technological progress (such as e-commerce, sensor networks and remote sensing data analysis, etc.), data stream came into this particular form of data. Data stream mining is a very challenging job, but also has high research value.In this paper introduces the frequent itemset mining the basic concepts and knowledge, the classical algorithm for mining frequent itemsets were introduced and analyzed the advantages and disadvantages, In the static data and dynamic data stream aspects of mining frequent itemsets have done in-depth research, major research work is divided into the following two aspects:The first, in the frequent itemsets mining, many algorithms are based on Apriori. These algorithms have two common problems, First, A lot of memory space are occupied by the entire database which must be loaded .Second, The processes of generating candidate itemset and computing support spend a lot of time. In order to improve efficiency, propose a BitTable-based form mining frequent itemsets algorithm—Hash-BFI, The database is compressed into the BitTable in accordance with horizontal and vertical direction saving lots of place, use the hash function to compute the frequent two itemsets, also completely utilize AND,OR operation to generate candidate itemset and compute support for candidate itemset, and produced a pruning, All these measures improve the efficiency of algorithm.The second, data stream have characteristics of the flow, continuity, and the unbalanced distribution of item , This paper presents a balanced space-time data stream mining frequent itemsets algorithm---Bala_Tree, The algorithm can only scan data stream once, rapid cluster updates, regular tree reconstruction and based classical algorithm for mining frequent itemsets. Experiments show that the algorithm can quickly scan and update data, the rational use of memory, accurate access to frequent item sets.Data stream mining has some application value. there are two types of mining algorithms at present: one is the entire data stream mining algorithms, but can not guarantee the integrity of mining, there will be some error;another is the stage mining or is the latest arrival data mining, but said the overall pattern information is not be guaranteed. Future changes in the data stream itself will be great, there will be more complex, more traffic data stream, therefore, to explore a kind of global, accurate data stream algorithm is the goal of the researchers.

Keywords/Search Tags:

Apriori, Frequent Itemsets, Hash-BFI, Data Stream, Bala_Tree

PDF Full Text Request

Related items

1	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
2	The Research And Implementation Of Mining Frequent Itemsets Algorithm Over Streaming Data
3	Research On Frequent Pattern Mining Algorithm Oriented To Data Stream
4	FP-Tree Based Mining Frequent Itemsets Over Data Streams
5	Research On Algorithm For Mining Top-k Frequent Closed Itemsets Over Data Stream
6	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application
7	Research On Multi-stream Frequent Item Set Mining Algorithm
8	Research On Optimization Of Data Stream Frequent Itemsets Mining Algorithm Based On Sliding Window
9	Frequent Itemsets Mining Algorithm And Its Application In Data Flow
10	Research On Frequent Patterns Mining Algorithm Based Sliding Window In Data Streams