Font Size: a A A

Study Of Fast Algorithms For Frequent Itemset Mining From Uncertain Data

Posted on:2018-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z WenFull Text:PDF
Abstract/Summary:PDF Full Text Request
Over the past decades, there have numerous classical algorithms on mining frequent itemsets from precise data. In recent years, due to the wide applications of uncertain data, the data mining techniques over uncertain databases has attracted much attention.Traditional algorithms, on mining frequent itemsets from uncertain data, are obtained by improving algorithms of mining precise data.TubeS-growth algorithm,which has very well compression performance sometimes.But when mining over the massive uncertain data it has some matters as follows :?When existence probabilities of items spreads over a broad or loose range,the algorithm will produce a lot of false frequent itemsets; ? When the algorithm is useed to mine the sparse uncertamn dataset which has massive items or the dense uncertain dataset in which the average length of affairs in dataset are long, it will run long time.For solving tow matters upon, this paper proposes a new mining algorithm by the ideal of divide and rule, namecd PtubeS-growth.This algorithm take advantage of science in database partition,when the main nmemory is incompatibility or the database is massive.The fist ,The database is divided into several sub database.The second,the algorithm begins mining locally potential frequent itemsets in every partition by mining constructed tree structure,and merging all locally into globally potential itemsets.The is by passing the database to check out all false frequent itemsets,consequently guaranteeing accuracy of mining results.For guaranteeing the rationality of the improved algorithm,in the process of algorithm design,this paper puts forward and proves related theorems to solves matter as follow:?How the minsup of every partitions is seted rational after the database are partitioned;?How to merge all locally potential itemsets in every partitions into globally potential itemsets.For insuring the high efficiency of the improved algorithm,this paper uses some optimization methods such as pruning and reducing the amount of calculation,to solve some matters which are caused by mining after database partition and merging of locally potential frequent itemsets,such as long run time.Experiments show that the high efficiency of our proposed PtubeS-growth algorithm well,both the sparse or the dense uncertain data are mined,and this algorithm solves problems of tubeS-growth algorithm which caused when it is used to mine over both the sparse or the dense uncertain data.
Keywords/Search Tags:Uncertain data, Tube S-growth algorithm, Frequent itemset, Tree-based structure, Expected support
PDF Full Text Request
Related items