Font Size: a A A

Research On Traditional Frequent Itemsets And Top-rank-k Frequent Itemsets Mining Algorithm

Posted on:2021-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:L W ZhuFull Text:PDF
GTID:2428330614958368Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Association rules analysis in data mining is a main direction of research,which is mainly used to reveal the correlation between items or attributes in the database,the most important and tedious step before mining association rules is mining frequent itemsets.In the traditional mining process of frequent itemsets,if the minimum support threshold is set too large or too small,an undesired number of frequent itemsets will be generated,so it is difficult to define a suitable value for the minimum support threshold.To solve the problem of minimum support threshold setting,the Top-rank-k frequent itemsets mining algorithm uses a dynamic threshold method to mine frequent itemsets,which reduces the dependence on the minimum support threshold.In order to improve the time and space efficiency of frequent itemset mining,this thesis studies the traditional frequent itemset mining algorithm and Top-rank-k mining algorithm,and analyzes the problems found in the mining process by using efficient data structures and pruning strategies.The main contents include:In order to solve the problems of complex tree construction and low mining efficiency in traditional frequent itemset mining algorithms,an efficient traditional frequent itemset improvement algorithm is proposed.The algorithm optimizes the construction process of PPC-tree,which improves the time efficiency of constructing PPC-tree,and the algorithm combines a new linear time complexity connection method and early pruning strategy to propose a more efficient 1-itemset connection method,which can determine the feasibility of itemset connection in the process of itemset connection.The improved algorithm uses the difference set operation to obtain the Diff Ndoeset of t(>2)-itemset,which avoids multiple complicated determination processes based on child-parent relationship.Finally,the improved algorithm uses an subsume index strategy to reduce the number of itemset connection.The Experimental results show that the improved algorithm has good performance in terms of running time and space utilization,especially in dense data sets,and the algorithm can be applied to data sets of various sizes.For the problem of setting the minimum support threshold in traditional frequent itemsets mining,this thesis applies the improved algorithm in the above research to the Top-rank-k mode,Experimental results show that this improved algorithm still has highmining efficiency in Top-rank-k mining mode.In order to further improve the time efficiency of mining Top-rank-k frequent itemsets,another improved algorithm of Top-rank-k frequent itemsets is proposed,the algorithm uses bitmap-code to represent node information,which can reduce the time complexity of itemset connection by one dimension,and then proposes efficient 1-itemset and t(>1)-itemset connection methods,which can predict the feasibility of itemset connection during the itemset connection process.Finally,the algorithm introduces the subsume index pruning strategy to further reduce the itemset search space.Experimental results show that the algorithm has good time and space efficiency,and it can meet the mining needs in most scenarios.
Keywords/Search Tags:big data, association rule, Top-rank-k itemset, dynamic threshold, pruning
PDF Full Text Request
Related items