Font Size: a A A

Research And Implementation Of Mining Frequent Itemsets Based On Dynamic Hashing And Transaction Compression

Posted on:2019-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:H Y HuFull Text:PDF
GTID:2428330593950450Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the Internet and Big Data Era,data mining has a greater impact on people's lives.Association rules mining as a key part of it,can find relevant information hidden in it from various data through various algorithms.Finding all frequent itemsets is the most important stage of association rules.Efficient frequent itemsets mining algorithm will greatly improve the time and space efficiency of association rules.Mining frequent itemsets is mainly divided into two steps: connection step and pruning step.The efficiency of these two parts will directly affect the generation time and space size of frequent itemsets.This thesis first analyzes the shortcomings of classical Apriori algorithm with low space-time efficiency.At the same time,it studies the current research results of mining frequent itemsets.Afterwards,the principles of the classical Apriori algorithm in this topic are studied in connection step and pruning step.At the same time,the method of mining the frequent itemsets for hashing the topic to generate the candidate set was studied.Then,taking a large number of data sets as an example,in order to improve the efficiency of generating frequent item sets,a dynamic hashing process for frequent item set mining is designed.Based on the statistical results of the data,combined with the data characteristics of the data set,four main factors affecting the dynamic hash are extracted.In order to solve the data set with different data characteristics,the application scope and scene of dynamic hash are refined.After analyzing the characteristics of dynamic hash join and finding the influencing factors,this thesis proposes a pruning step algorithm based on transaction compression for dynamic hashed data and classical Apriori algorithm.The extensive experiments were done to compare the improved algorithm with the classic Apriori algorithm.The experimental results show that the algorithm based on transaction compression is more efficient.To solve the frequent itemsets mining problem of large-scale datasets with different data characteristics,an improved algorithm combining dynamic hashing and transactional compression is designed.Finally,the improved algorithm is applied to the mining of association rules in campus networks.Association rules mining is conducted using the campus network web logs as data sets.According to the results of the excavation,feasibility suggestions were made for the decision-making of campus official website structure optimization,website layout,related recommendation,response speed,and abnormal operations.
Keywords/Search Tags:Association rules, frequent itemsets, candidate sets, dynamic hashing, transaction compression
PDF Full Text Request
Related items