Font Size: a A A

Research And Application Of Frequent Itemset Mining Algorithm

Posted on:2021-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:D J JiangFull Text:PDF
GTID:2428330614965939Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development and wide application of information technology,a large amount of data has been accumulated in all walks of life.The use of traditional data processing technology cannot fully discover and utilize the hidden knowledge in data.Therefore,data mining technology came into being.As an important method in the area of data mining,association rules mining can find valuable information in massive data,and assisting many business activities to make appropriate business decisions.Association rules mining is divided into two steps: mining frequent itemset and mining association rules.Frequent itemset mining is the key to determining the efficiency of mining association rules.Therefore,improving the efficiency of mining frequent itemset is a research hot topic of mining association rules.In order to improve the efficiency of frequent itemset mining,this thesis improves the FP-Growth related algorithm.A frequent itemset mining algorithm named UFIM(Unidirectional Frequent Itemset Mining)is proposed based on the UFP-tree.This algorithm first constructs a Unidirectional frequent pattern tree(UFP-tree),then introduces a constrained subtree on the constructed UFP-tree;divides the constrained substree into two cases: pointing to same endpoint and pointing to the different endpoint,and respectively uses non-recursive method and recursive method to mine frequent itemset.The non-recursive method determines whether the endpoint's support count is smaller than the minimum support count.If it is smaller,the constrained subtree does not have frequent itemset,otherwise the frequent itemset of the constrained subtree is composed of nodes except the root node.Experimental results show that the running speed of UFIM algorithm is higher than similar algorithms.In order to improve the efficiency of frequent itemset mining of UFIM algorithm in big data,a parallelization scheme of UFIM algorithm is designed based on the Spark platform.This scheme first gets frequent 1-itemset in parallel,and distributes the data required by the single-item constrained subtree to multiple child nodes.Each child node independently mines the frequent itemset that belongs to the part.Finally,the local frequent itemset mined by each node are summarized to get the global frequent itemset.The experimental results show that the parallel UFIM algorithm based on the Spark platform has better timeliness,and it is suitable for mining frequent itemset for big data.In order to better test the practicality of parallel UFIM algorithm based on the Spark platform,a simple book recommendation system is developed in this thesis.This system analyzes the user's historical purchase records to get the association rules where the antecedent and consequent are book identifiers,and recommends the books that the user may purchase based on the book identifier browsed.The application results show that parallel UFIM algorithm based on the Spark platform can be effectively applied to book recommendation system,and it can make the accurate product recommendation.
Keywords/Search Tags:UFP-tree, frequent itemset, parallelization, Spark platform, recommendation system
PDF Full Text Request
Related items