Font Size: a A A

Study On Parallel Mining Frequent Itemsets Over Uncertain Database Based On Hadoop

Posted on:2014-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:S Q WangFull Text:PDF
GTID:2308330482950331Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, because of the wide applications of uncertain data, mining fre-quent itemsets over uncertain databases has attracted much attention. In uncertain databases, the support of an itemset is a random variable instead of a fixed occurrence counting of this itemset. Thus, the frequent itemset under uncertain environments has two different definitions so far. The first definition, referred as the expected support-based frequent itemset, employs the expectation of the support of an itemset to measure whether this itemset is frequent. The second definition, referred as the probabilistic fre-quent itemset, uses the probability of the support of an itemset to measure its frequency. However, facing such a "Big Data " time, existing works on mining frequent itemsets over uncertain databases can not able to handle. So how to mining frequent itemsets from uncertain database efficiently is a challenging problem, and has important the-oretical and practical value. At present, the MapReduce programming model, which based on the Hadoop platform, provides a new method for frequent itemsets mining over uncertain database.This paper has conducted the research of frequent itemsets mining over uncertain database from three aspects. First of all, this paper implemented the parallel frequent itemsets mining over uncertain database based on expected support. The result per-formed on several datasets verified its efficient and effectiveness. Secondly, this paper implemented the parallel probabilistic frequent itemsets mining. In order to calculating the frequentness probability of an itemset, this paper introduced concept of generat-ing function. Finally, this paper implemented frequent itemsets mining over uncertain database by approximating the frequentness probability of an itemset. Poisson distribu-tion and Normal distribution were used to approximating the frequentness probability. The results tested on several datasets indicate both parallized algorithms not only guar-anteed high accuracy but also improved the mining efficiency.
Keywords/Search Tags:frequent itemset mining, Hadoop, MapReduce, uncertain database
PDF Full Text Request
Related items