Font Size: a A A

Efficiently Mining Maximal Diverse Frequent Itemset

Posted on:2020-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:D X LuoFull Text:PDF
GTID:2428330590978672Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the volume of data has reached an unprecedented scale.Large-scale data brings convenience to people's life and work,but also causes many problems,which are mainly manifested in the huge gap between the ability of human to collect data,organize data and process data.In some scenarios people lack effective data analysis and mining methods,and cannot make full use of the collected data,which leads to the phenomenon of "data explosion but knowledge poverty".Frequent pattern mining is usually the first step in large-scale data analysis.It has been an active research topic in the field of data mining for many years.Frequent itemsets mining,which is an important task in frequent pattern mining,is to mine the itemsets whose support exceed the predefined minimum support threshold in a data set.By mining frequent itemsets in a data set,people can extract and analyze the association rules hidden in the data.The main problem of traditional frequent itemsets mining is that the number of frequent itemsets is very large.It is a challenge to compute and store these frequent itemsets,and usually it is also unnecessary to mine such a large number of frequent itemsets.To solve this problem,many researches have proposed a lot of constraintbased frequent itemsets,such as closed frequent itemsets,maximum frequent itemsets and so on.This paper reviewed previous related work,introduced the background,the development,and the research status of frequent itemset mining in details,analyzed the hotspots in the field of frequent itemset mining.Based on the existing work,this paper proposed the maximal diverse frequent itemsets(MDFIs).Due to the maximal constraint,MDFIs can avoid a large number of frequent sub-itemsets generated in the process of mining frequent itemset.In addition,the diversity of itemsets,which is used to measure the category-based difference between items in an itemset,is integrated into the frequent itemset mining.Having the two constrains,the number of the MDFIs mined is usually not very large,and the items in the itemsets have very large category-based differences.In order to efficiently mine the MDFIs,two mining algorithms,i.e.,the basic algorithm and the bound-based algorithm,are proposed in this paper.The basic algorithm,which extends existing maximal frequent itemset mining techniques,first mines all the maximal frequent itemsets in the data set,and then evaluate the diversity of each found maximal frequent itemset.It is not scalable on large data sets.The bound based algorithm is proposed on the basis of a novel data structure FP*-tree designed in this paper.The FP*-tree,a variant of the FP-tree,is not only able to store compactly the necessary information for MFI computation,but also contains a posting list for each item,which make it possible to construct supersets for maximal frequent itemsets.When using FP*-tree to mine the MDFIs,the bound-based algorithm calculates the upper bound of the diversity of the frequent itemsets containing each item,and first mines the frequent itemsets which have large diversity.In the mining process,the algorithm calculates the diversity of the MDFIs and checks the upper bound of the potential maximal frequent itemsets in the data set,therefore,the bound-based algorithm can perceive whether the MDFIs that have been mined meet the needs of the mining task.The algorithm terminates and returns the results when it confirms that the requested MDFIs has been discovered.To evaluate the efficiency of the maximal diverse frequent itemsets mining algorithms,this paper designed several comparative experiments to test the performance of the basic algorithm and bound based algorithm under different conditions.The experimental results show that the bound based algorithm has significant advantages in mining the MDFIs.
Keywords/Search Tags:Algorithm, Frequent Itemsets, Diversification
PDF Full Text Request
Related items