Approximation Of Probabilistic Maximal Frequent Itemset Mining Over Uncertain Database

Posted on:2019-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:L H Nie

Full Text:PDF

GTID:2348330542489024

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Probabilistic or uncertain transaction databases are widely used in many fields such as underground coal mine monitoring and moving object search.For uncertain transaction databases,many researchers focus on discovering frequent itemsets in an efficient way.To interpret the existential uncertainties of transactions better,the Possible World Semantics is often adopted.Because a probabilistic database can generate an exponential number of possible worlds,it is technically challenging to mine probabilistic maximal frequent items(PMFIs)efficiently in large databases.Several solutions such as DP and DC are proposed to settle the issue that time complexity of Possible World Semantics is very high.The results from those methods are the same as the results obtained by the possible worlds.But DP,DC and TODIS can obtain result in a much faster way than computing possible worlds.The reason is that they only compute a probability distribution of itemset's support rather than getting the corresponding possible worlds.It is clear that computing this kind of distribution,which only needs O(nlogn)time,is much faster than getting exponential number of possible worlds.However,with the size of database becoming too large,DP,DC and TODIS cannot work well any longer because O(nlogn)time is not affordable when n is too large.To address this issue,we proposed an approximate method to mine PMFIs over uncertain database,which can accelerate mining efficiency greatly with a little accuracy loss.Two steps,candidates generation and PMFIs confirmation,are included in this methods.In the first step,support expectation bound can be obtained with utilizing Chernoff bound and this expectation bound can help improve the accuracy of obtained candidates.In the second step,estimated frequency can be gotten in a fast way with using Central Limit Theorem.Moreover,we give the proofs for the theorems appear in this paper,which make the paper more convincing.To further illustrate the efficiency and performance of the proposed algorithm,we design and conduct the experiment on six datasets,two synthetic datasets and real datasets are in concluded.When candidates are generated,we compare the proposed method with traditional Apriori and analysis the impacts of data size,support threshold,and frequent probability threshold.When confirm the PFMIs,we compare our proposed algorithm with TODIS-MAX,which is the state-of-art method.Those two methods are evaluated in running time and result accuracy.According to experiment,we can conclude that the proposed algorithm can reduce the candidate size effectively and improve the efficiency of mining PFMIs greatly with a little accuracy loss.

Keywords/Search Tags:

Uncertain Database, Probabilistic Maximal Frequent Itemsets, Approximate Methods

PDF Full Text Request

Related items

1	An Algorithm And Context Analysis Of Mining Frequent Closet Itemsets
2	Research On Algorithms For Mining Maximal Frequent Itemsets
3	Research On Mining Algorithms Of Maximal Frequent Itemsets And Opened Frequent Itemsets
4	Algorithms Of Probabilistic Frequent Itemsets From Uncertain Data
5	Research On Algorithm Of Mining Frequent Itemsets From Uncertaln Data
6	The Research And Implementation Of Mining Frequent Itemsets Algorithm Over Streaming Data
7	Research On Algorithm For Mining Frequent Itemsets Of Uncertain Data
8	FP-Tree Based Mining Frequent Itemsets Over Data Streams
9	Research On Mining Algorithm Of Maximal Frequent Itemsets
10	An Algorithm Of Maximal Frequent Itemsets Mining Based On Dynamic Reordering