Font Size: a A A

Research On Frequent Pattern Mining Algorithm Of Uncertain Data Set Based On Spark

Posted on:2020-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2438330596997548Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The purpose o f frequent pattern min ing is to mine valuable patterns from datasets and provide decision-making basis for policy makers,in which frequent pattern mining in uncertain datasets is the hot research topic currentl y.In recent years,with the explosive growth of data number,it is difficult to meet the needs of big data on the single-computer computing environment,so it is a common means to ensure the efficiency of d ata processing b y rel y on distribut ed computing platform.Traditional frequent pattern mining algorithms for uncertain dataset s mostl y take the expectatio n,probability or weight as the single support of the data item,which are difficult to mine more valuable frequent patterns.In view of the above problems,this paper proposes the UWEFP algorithm and FPEWU algorithm based on Spark,using cluster computing framework,considering the pro babilit y and weight of data items,the UCI dataset is used to verify the two algorithms,the experimental results sh ow that the proposed method s are reasonable and feasible,the efficiency is improved while the results are guaranteed.Specific research con tents are as follows:(1)The concept of items' maxwp(x)is proposed and applied to a nove l pruning strategy.The pruning strategy prunes 1-items b y using maxwp(x),thus frequent patterns filtered can take both the items' probabilit y and items' weight of into account.(2)A novel UWEFP-tree with FP-tree struct ure features is designed.It is used to construct the pattern tree and mine the freq uent patterns.(3)A new algorithm UWEFP is proposed,which based on Spark for mining uncertain datasets and combined the advantages of Spark to process transactions in groups.UWEFP constructed pattern tree to mine the qualified initial frequent patterns in each group,and then compared the support of initial frequent patterns with the minimum support to mine the frequent patterns that can give consideration to the p robabilit y and weight of data items at the same time.The experimental results show that the UWEFP algorithm can mine frequent patterns more quickl y and effectively than other traditional algorithms.(4)In order to reduce the space complexity of UWEFP-tree,a novel pattern tree,FPEWU-tree is designed.It is used to construct pattern t ree and mined frequent patterns.(5)In order to reduce the time complexit y of UWEFP algorithm,FPEWU algorithm is proposed.The difference is that UWEFP algorithm is g rouped b y transactions and the FPEWU algorithm is grouped b y data items.The FPEWU algorithm constructs pattern trees in each grou p and finds frequent patterns that take into account both probabilit y and weight of data items.The experimental results show that the UWEFP algorithm can mine frequent patterns more quickl y and effectivel y than other traditional algorithms.The UWEFP algorithm is more eff ectivel y than the FPEWU algorithm for frequent pattern m ining of sparse datasets.The PEWU algorithm is more effectivel y than the UWEFP alg orithm for frequent p attern mi ning of dense datasets.
Keywords/Search Tags:uncertain data, data mining, frequent patterns, Spark
PDF Full Text Request
Related items