Research On Weighted Frequent Itemset Mining In Uncertain Databases

Posted on:2020-06-17

Degree:Master

Type:Thesis

Country:China

Candidate:W Wang

Full Text:PDF

GTID:2428330602960809

Subject:Computer technology

Abstract/Summary:

With the rapid development of the Internet,the Internet of Things and the mobile Internet,human beings have entered into the era of big data.How to obtain the most valuable information from big data has gradually attracted much attention from the research community to the industry.Frequent itemsets in the database are considered to be the most common valuable information and are widely used in many research fields,such as cluster classification,association rule extraction,and moving object detection.However,when there is uncertainty in the itemset of the database or the attribute set of the database has weights,it is usually technique challenging to mine its frequent itemsets.For the uncertain database,the researchers propose to use the possible worlds model to instantiate each possible world of the uncertain database associated with its existence probability,and transform the processing on the uncertain database into the operations on the deterministic database.Furthermore,two definitions of frequent itemsets in uncertain database are proposed:the support-expected based frequent itemsets and the probabilistic frequent itemsets.At the same time,in uncertain database,its attribute values are sometimes not equally important,that is,it is more practical to mine the weighted frequent itemsets.However,the introduction of weights leads to the anti-monotonicity of classical frequent itemsets unavailable,which drastically reduces the efficiency of the mining algorithm.In recent years,researchers have proposed some anti-monotonicity for weighted frequent itemsets mining.However,the existing algorithms only focus on the support-expected based frequent itemsets,and there is only limited research on weighted probabilistic frequent item mining.Based on the above analysis,this paper proposes a novel algorithm for weighted probabilistic frequent items(w-PFI)mining,which is based on the classical Apriori generation-verification framework and employ the breadth-first frequent itemset search method.In order to improve the efficiency of this algorithm,we further give three pruning techniques to reduce the size of the candidate set.Specially,we extend the weight judgment closure technique and defined anovel anti-monotonicity property for w-PFI.Based on this anti-monotonicity,we present a pruning technique for frequent item candidate sets,whose pruning ability is higher than the pruning method of the weight judgment closure method itself.Moreover,we use the Poisson binomial distribution to define a probability model of w-PFI support,with which we propose two more efficient pruning techniques from the perspective of probability.Finally,we conduct a large number of experiments on both real data sets and synthetic data sets,and evaluate the performance of the exact w-PFI and approximate w-PFI algorithms proposed in the paper in terms of running time,accuracy and scalability.The results show that our methods achieve higher performance than the existing algorithms.

Keywords/Search Tags:

Uncertain Database, Weighted Frequent Itemset, Probability Frequent Itemset, Probability Model, Pruning Technique

Related items

1	The Research On IDS Based On Mining Max Frequent Itemset Using Big Step Pruning Strategy
2	Research On Mining Frequent Itemsets Algorithm Based On Bittable
3	An Algorithm For Mining Frequent Itemsets From Data Streams
4	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
5	Study Of Fast Algorithms For Frequent Itemset Mining From Uncertain Data
6	Recent Weighted Frequent Itemset Mining For Data Streams And Applications
7	Research And Realization Of Parallel Algorithm For Mining Frequent Closed Itemsets
8	New algorithms for frequent sequential pattern and itemset data mining in certain and uncertain databases
9	Algorithms Of Maximal Frequent Itemset Mining And Their Applications
10	Research On Mining Frequent Itemsets Over Data Stream