Study On The Key Methods Over Uncertain Database

Posted on:2016-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:J Xu

Full Text:PDF

GTID:2348330461956857

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,due to the wide applications of uncertain data,uncertain data mining has attracted much attention.The uncertainty is usually caused by artificial noise,network latency,unreliable data collection and transmission,or other possible factors.How to effectively manage,use the uncertain data and mining the valuable information is now becoming a serious problem.Uncertain data mining has achieved considerable progress,but due to the complexity of uncertain data and the amount of uncertain data is increasing,effectively mining uncertain data still remains challenges.In the uncertain data mining,frequent itemset mining and classification are the most important fields.In this thesis,we study on these two fields of uncertain data mining.Moreover,in order to improve the efficiency of massive uncertain data mining,we use the MapReduce parallel framework,which based on the Hadoop platform,for un-certain data mining in parallel.In uncertain data,each item exists in each transaction with a certain probability.Thus,the frequent itemset under uncertain environments has two definitions:the probabilistic frequent itemset and the expected support-based frequent itemset.Since the probabilistic frequent itemset may better reflect the proba-bility distribution of each itemset.In this thesis we focus on the probabilistic frequent itemset mining and associate rule classification over uncertain data,which contains the following three aspects.First of all,analyses about probabilistic frequent itemset and its probability ap-proximate methods have been conducted.We propose a parallelized probabilistic fre-quent itemsets mining method based on the normal approximation.The experiments and comparisons are carefully carried out to further validate and analyze the efficiency of our approach.Secondly,to improve the efficiency of probabilistic frequent itemset mining in sparse uncertain data.We propose the approximate computation method based on the FP-Growth algorithm and use the poisson approximation to effectively mining the probabilistic frequent itemsets.Due to the uncertain information be stored into a tree structure,our method avoids multiple time-consuming data scans.Finally,we introduce the concept of probabilistic frequency into the associative classification problem of uncertain data,propose the uncertain data associate rule clas-sification method based on removing redundant and conflicting rules.Our method first uses the probabilistic frequency as the measurement of associate rule to improve the discrimination of each rule and the classification accuracy.

Keywords/Search Tags:

probabilistic frequent itemset mining, uncertain data classification, Hadoop, MapReduce

PDF Full Text Request

Related items

1	Study On Parallel Mining Frequent Itemsets Over Uncertain Database Based On Hadoop
2	Study On Mining Closed Frequent Itemset Based On Hadoop
3	Study On Probabilistic Frequent Pattern Mining Over Uncertain Data Stream
4	Study Of Fast Algorithms For Frequent Itemset Mining From Uncertain Data
5	Parallel Frequent Itemset Mining Based On MapReduce
6	Approximation Of Probabilistic Maximal Frequent Itemset Mining Over Uncertain Database
7	New algorithms for frequent sequential pattern and itemset data mining in certain and uncertain databases
8	Research On Weighted Frequent Itemset Mining In Uncertain Databases
9	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
10	Algorithms Of Probabilistic Frequent Itemsets From Uncertain Data