Research On Maximal Frequent Itemset Mining Algorithm Based On H-struct

Posted on:2021-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:L Meng

Full Text:PDF

GTID:2428330623973235

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

Mining maximal frequent itemset is an important direction of data mining research.It reflects the relationship between items in the form of the simplest set of frequent itemset cluster,which has important theoretical value and application prospect.But most of the algorithms about maximal frequent itemset mining are more suitable for dense datasets.However,in practical application,there are a large number of sparse data sets with scattered items and large differences in transaction patterns.Therefore,it is significant to design a mining algorithm for sparse datasets.In this thesis,we discussed the classic maximal frequent itemset mining algorithm from the aspects of data structure,algorithm search method and optimal pruning strategy.Then we conclude that the existing algorithms for mining maximal frequent itemset based on pattern growth threshodl are inefficient for sparse datasets mining.In fact,sparsity is one of the essential characteristics reflecting the density of dataset.We can use sparsity to classify datasets,and study the adaptability of datasets with different sparsity around the maximum frequent itemset mining algorithm.Some research results are shown as follows:(1)The method SMMAM based on adjacency matrixis is proposed to measure the sparsity of transaction dataset.Our method rearranges the transaction dataset in order,and reflects the correlation between items through the two-dimensional relationship of elements in the adjacency matrix,so as to achieve the purpose of quantifying the sparsity.The experimental results show that the sparsity calculated by SMMAM can accurately reflect the density of the dense or sparse transaction dataset especially.(2)In the face of the problem that the traditional algorithm which based on pattern growth is difficult to mine the sparse datasets efficiently.This paper introduces theH-struct and proposes the algorithm HMFI for maximal frequent itemset mining.H-struct uses the method of divide and conquer to mine each block recursively,which saves the memory space.At the same time,the algorithm uses the optimal pruning strategy based on the item ordering relationship to reduce the search space of H-struct and improve the efficiency of traversal.Finally,experiments are designed to evaluate the efficiency of HMFI and Max-Miner algorithms in dense and sparse datasets.The results show that HMFI has more advantages in mining sparse datasets than Max-Miner.

Keywords/Search Tags:

data mining, maximal frequent itemset, adjacency matrix, sparsity, H-struct

PDF Full Text Request

Related items

1	Algorithms Of Maximal Frequent Itemset Mining And Their Applications
2	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
3	The Research On IDS Based On Mining Max Frequent Itemset Using Big Step Pruning Strategy
4	Study On Mining Maximal Frequent Itemset Based On Iceberg Concept Lattice
5	Research On Frequent Itemsets Mining Algorithm In Data Stream
6	Research On Mining Algorithms Of Maximal Frequent Itemsets And Opened Frequent Itemsets
7	The Research And Implementation Of Association Rule Data Mining Algorithm
8	Research On Mining Frequent Itemsets Over Data Stream
9	Mining Algorithm Of Frequent Items Based On Item Adjacentcy List And Trasaction Tree
10	Research On Algorithms For Mining Maximal Frequent Itemsets