Font Size: a A A

Approximation Of Probabilistic Maximal Frequent Itemset Mining Over Uncertain Database

Posted on:2019-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:L H NieFull Text:PDF
GTID:2348330542489024Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Probabilistic or uncertain transaction databases are widely used in many fields such as underground coal mine monitoring and moving object search.For uncertain transaction databases,many researchers focus on discovering frequent itemsets in an efficient way.To interpret the existential uncertainties of transactions better,the Possible World Semantics is often adopted.Because a probabilistic database can generate an exponential number of possible worlds,it is technically challenging to mine probabilistic maximal frequent items(PMFIs)efficiently in large databases.Several solutions such as DP and DC are proposed to settle the issue that time complexity of Possible World Semantics is very high.The results from those methods are the same as the results obtained by the possible worlds.But DP,DC and TODIS can obtain result in a much faster way than computing possible worlds.The reason is that they only compute a probability distribution of itemset's support rather than getting the corresponding possible worlds.It is clear that computing this kind of distribution,which only needs O(nlogn)time,is much faster than getting exponential number of possible worlds.However,with the size of database becoming too large,DP,DC and TODIS cannot work well any longer because O(nlogn)time is not affordable when n is too large.To address this issue,we proposed an approximate method to mine PMFIs over uncertain database,which can accelerate mining efficiency greatly with a little accuracy loss.Two steps,candidates generation and PMFIs confirmation,are included in this methods.In the first step,support expectation bound can be obtained with utilizing Chernoff bound and this expectation bound can help improve the accuracy of obtained candidates.In the second step,estimated frequency can be gotten in a fast way with using Central Limit Theorem.Moreover,we give the proofs for the theorems appear in this paper,which make the paper more convincing.To further illustrate the efficiency and performance of the proposed algorithm,we design and conduct the experiment on six datasets,two synthetic datasets and real datasets are in concluded.When candidates are generated,we compare the proposed method with traditional Apriori and analysis the impacts of data size,support threshold,and frequent probability threshold.When confirm the PFMIs,we compare our proposed algorithm with TODIS-MAX,which is the state-of-art method.Those two methods are evaluated in running time and result accuracy.According to experiment,we can conclude that the proposed algorithm can reduce the candidate size effectively and improve the efficiency of mining PFMIs greatly with a little accuracy loss.
Keywords/Search Tags:Uncertain Database, Probabilistic Maximal Frequent Itemsets, Approximate Methods
PDF Full Text Request
Related items