Paralleled Frequent Itemset Mining And Research On Data Stream

Posted on:2011-10-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y S He

Full Text:PDF

GTID:2178360305964858

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Frequent pattern mining is one of significant tasks in data mining. Through mining frequent pattern, we can find out interested correlations and associations hiding in data. As a kind of frequent pattern mining, mining frequent itemset can be used in many other data mining tasks, such as association rules mining, cluster analysis, classification and prediction, intrusion detection, relevance analysis, etc.Due to data mining is presented for the vast amount of data at the very beginning, the performance of many data mining algorithms are unfavorable for the enormous searching space. High performance paralleled environment opens up a new road for data mining, the data mining research based on parallel environment becomes the hot point in data mining field. Frequent itemset mining is no exception, through couple years study, many paralleled frequent itemset mining algorithms have already emerged.In this paper, HPFP-Miner, a frequent itemset mining algorithm based on distributed parallel environment is presented. HPFP-Miner is a kind of FP-Growth-like algorithm which compresses database into a data structure and mine this data structure. The proposed algorithm builds HPFP-tree and HPFP-forest on paralleled processors by twice scan on database. Each processor only needs to mine its local HPFP-tree, and output the result into a shared file. The communication is centralized to tree building phase, while the whole mining phase does not need synchronization among processors, which greatly reduce the communication.Owing to the widely used of data stream, frequent itemset mining on data stream have received more attention. Data stream is fast changing, massive, and potentially infinite, we have to establish new data structure and algorithm to mine it. On the base of the previous work, we propose a new paralleled frequent itemset mining algorithm for data stream based on sliding window, which is called PFIMSD. The algorithm compresses whole data in current window into PSD-tree on paralleled processor only by one-scan. We apply increment method to append or delete related branch on PSD-tree when window is sliding. The experiment shows PFIMSD algorithm has good performance on efficiency and expansibility.

Keywords/Search Tags:

Frequent Pattern, Frequent Itemset Mining, Paralleled, High Performance, Data Stream

PDF Full Text Request

Related items

1	A Method Based On The Vertical Division Of Data Stream Frequent Itemset Mining Algorithm
2	Study On Probabilistic Frequent Pattern Mining Over Uncertain Data Stream
3	Research On Frequent Itemsets Mining Algorithm In Data Stream
4	Research On The Algorithm Of Data Stream Frequent Itemsets Mining
5	Research And Implement Of A Matrix Based Paralleled Frequent Itemset Mining Algorithm
6	Multi-Relational Frequent Pattern Mining Algorithm And Its Application Research
7	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
8	Research On Mining Frequent Itemsets Over Data Stream
9	The Research On The Algorithm About Online Mining Closed Frequent Itemsets Over Data Stream
10	Research Of An Algorithm For Frequent Closed Itemset Mining On Data Stream