Font Size: a A A

Study On The Key Methods Over Uncertain Database

Posted on:2016-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2348330461956857Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,due to the wide applications of uncertain data,uncertain data mining has attracted much attention.The uncertainty is usually caused by artificial noise,network latency,unreliable data collection and transmission,or other possible factors.How to effectively manage,use the uncertain data and mining the valuable information is now becoming a serious problem.Uncertain data mining has achieved considerable progress,but due to the complexity of uncertain data and the amount of uncertain data is increasing,effectively mining uncertain data still remains challenges.In the uncertain data mining,frequent itemset mining and classification are the most important fields.In this thesis,we study on these two fields of uncertain data mining.Moreover,in order to improve the efficiency of massive uncertain data mining,we use the MapReduce parallel framework,which based on the Hadoop platform,for un-certain data mining in parallel.In uncertain data,each item exists in each transaction with a certain probability.Thus,the frequent itemset under uncertain environments has two definitions:the probabilistic frequent itemset and the expected support-based frequent itemset.Since the probabilistic frequent itemset may better reflect the proba-bility distribution of each itemset.In this thesis we focus on the probabilistic frequent itemset mining and associate rule classification over uncertain data,which contains the following three aspects.First of all,analyses about probabilistic frequent itemset and its probability ap-proximate methods have been conducted.We propose a parallelized probabilistic fre-quent itemsets mining method based on the normal approximation.The experiments and comparisons are carefully carried out to further validate and analyze the efficiency of our approach.Secondly,to improve the efficiency of probabilistic frequent itemset mining in sparse uncertain data.We propose the approximate computation method based on the FP-Growth algorithm and use the poisson approximation to effectively mining the probabilistic frequent itemsets.Due to the uncertain information be stored into a tree structure,our method avoids multiple time-consuming data scans.Finally,we introduce the concept of probabilistic frequency into the associative classification problem of uncertain data,propose the uncertain data associate rule clas-sification method based on removing redundant and conflicting rules.Our method first uses the probabilistic frequency as the measurement of associate rule to improve the discrimination of each rule and the classification accuracy.
Keywords/Search Tags:probabilistic frequent itemset mining, uncertain data classification, Hadoop, MapReduce
PDF Full Text Request
Related items