Font Size: a A A

Paralleled Frequent Itemset Mining And Research On Data Stream

Posted on:2011-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y S HeFull Text:PDF
GTID:2178360305964858Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Frequent pattern mining is one of significant tasks in data mining. Through mining frequent pattern, we can find out interested correlations and associations hiding in data. As a kind of frequent pattern mining, mining frequent itemset can be used in many other data mining tasks, such as association rules mining, cluster analysis, classification and prediction, intrusion detection, relevance analysis, etc.Due to data mining is presented for the vast amount of data at the very beginning, the performance of many data mining algorithms are unfavorable for the enormous searching space. High performance paralleled environment opens up a new road for data mining, the data mining research based on parallel environment becomes the hot point in data mining field. Frequent itemset mining is no exception, through couple years study, many paralleled frequent itemset mining algorithms have already emerged.In this paper, HPFP-Miner, a frequent itemset mining algorithm based on distributed parallel environment is presented. HPFP-Miner is a kind of FP-Growth-like algorithm which compresses database into a data structure and mine this data structure. The proposed algorithm builds HPFP-tree and HPFP-forest on paralleled processors by twice scan on database. Each processor only needs to mine its local HPFP-tree, and output the result into a shared file. The communication is centralized to tree building phase, while the whole mining phase does not need synchronization among processors, which greatly reduce the communication.Owing to the widely used of data stream, frequent itemset mining on data stream have received more attention. Data stream is fast changing, massive, and potentially infinite, we have to establish new data structure and algorithm to mine it. On the base of the previous work, we propose a new paralleled frequent itemset mining algorithm for data stream based on sliding window, which is called PFIMSD. The algorithm compresses whole data in current window into PSD-tree on paralleled processor only by one-scan. We apply increment method to append or delete related branch on PSD-tree when window is sliding. The experiment shows PFIMSD algorithm has good performance on efficiency and expansibility.
Keywords/Search Tags:Frequent Pattern, Frequent Itemset Mining, Paralleled, High Performance, Data Stream
PDF Full Text Request
Related items