Font Size: a A A

Research On Maximal Frequent Itemset Mining Algorithm Based On H-struct

Posted on:2021-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:L MengFull Text:PDF
Abstract/Summary:PDF Full Text Request
Mining maximal frequent itemset is an important direction of data mining research.It reflects the relationship between items in the form of the simplest set of frequent itemset cluster,which has important theoretical value and application prospect.But most of the algorithms about maximal frequent itemset mining are more suitable for dense datasets.However,in practical application,there are a large number of sparse data sets with scattered items and large differences in transaction patterns.Therefore,it is significant to design a mining algorithm for sparse datasets.In this thesis,we discussed the classic maximal frequent itemset mining algorithm from the aspects of data structure,algorithm search method and optimal pruning strategy.Then we conclude that the existing algorithms for mining maximal frequent itemset based on pattern growth threshodl are inefficient for sparse datasets mining.In fact,sparsity is one of the essential characteristics reflecting the density of dataset.We can use sparsity to classify datasets,and study the adaptability of datasets with different sparsity around the maximum frequent itemset mining algorithm.Some research results are shown as follows:(1)The method SMMAM based on adjacency matrixis is proposed to measure the sparsity of transaction dataset.Our method rearranges the transaction dataset in order,and reflects the correlation between items through the two-dimensional relationship of elements in the adjacency matrix,so as to achieve the purpose of quantifying the sparsity.The experimental results show that the sparsity calculated by SMMAM can accurately reflect the density of the dense or sparse transaction dataset especially.(2)In the face of the problem that the traditional algorithm which based on pattern growth is difficult to mine the sparse datasets efficiently.This paper introduces theH-struct and proposes the algorithm HMFI for maximal frequent itemset mining.H-struct uses the method of divide and conquer to mine each block recursively,which saves the memory space.At the same time,the algorithm uses the optimal pruning strategy based on the item ordering relationship to reduce the search space of H-struct and improve the efficiency of traversal.Finally,experiments are designed to evaluate the efficiency of HMFI and Max-Miner algorithms in dense and sparse datasets.The results show that HMFI has more advantages in mining sparse datasets than Max-Miner.
Keywords/Search Tags:data mining, maximal frequent itemset, adjacency matrix, sparsity, H-struct
PDF Full Text Request
Related items