Study On Parallel Mining Frequent Itemsets Over Uncertain Database Based On Hadoop

Posted on:2014-06-01

Degree:Master

Type:Thesis

Country:China

Candidate:S Q Wang

Full Text:PDF

GTID:2308330482950331

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years, because of the wide applications of uncertain data, mining fre-quent itemsets over uncertain databases has attracted much attention. In uncertain databases, the support of an itemset is a random variable instead of a fixed occurrence counting of this itemset. Thus, the frequent itemset under uncertain environments has two different definitions so far. The first definition, referred as the expected support-based frequent itemset, employs the expectation of the support of an itemset to measure whether this itemset is frequent. The second definition, referred as the probabilistic fre-quent itemset, uses the probability of the support of an itemset to measure its frequency. However, facing such a "Big Data " time, existing works on mining frequent itemsets over uncertain databases can not able to handle. So how to mining frequent itemsets from uncertain database efficiently is a challenging problem, and has important the-oretical and practical value. At present, the MapReduce programming model, which based on the Hadoop platform, provides a new method for frequent itemsets mining over uncertain database.This paper has conducted the research of frequent itemsets mining over uncertain database from three aspects. First of all, this paper implemented the parallel frequent itemsets mining over uncertain database based on expected support. The result per-formed on several datasets verified its efficient and effectiveness. Secondly, this paper implemented the parallel probabilistic frequent itemsets mining. In order to calculating the frequentness probability of an itemset, this paper introduced concept of generat-ing function. Finally, this paper implemented frequent itemsets mining over uncertain database by approximating the frequentness probability of an itemset. Poisson distribu-tion and Normal distribution were used to approximating the frequentness probability. The results tested on several datasets indicate both parallized algorithms not only guar-anteed high accuracy but also improved the mining efficiency.

Keywords/Search Tags:

frequent itemset mining, Hadoop, MapReduce, uncertain database

PDF Full Text Request

Related items

1	Study On The Key Methods Over Uncertain Database
2	Study On Mining Closed Frequent Itemset Based On Hadoop
3	Research On Weighted Frequent Itemset Mining In Uncertain Databases
4	Parallel Frequent Itemset Mining Based On MapReduce
5	New algorithms for frequent sequential pattern and itemset data mining in certain and uncertain databases
6	Study Of Fast Algorithms For Frequent Itemset Mining From Uncertain Data
7	Research On Parallel Frequent Itemset Mining Algorithm Based On MapReduce
8	Approximation Of Probabilistic Maximal Frequent Itemset Mining Over Uncertain Database
9	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
10	Study On Probabilistic Frequent Pattern Mining Over Uncertain Data Stream