Font Size: a A A

Research On Association Rules Algorithm In Data Mining

Posted on:2020-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:X JiFull Text:PDF
GTID:2428330590451034Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Association Rules Mining is one of the hot research directions of Data Mining.With the explosive growth of data scale,the time efficiency of traditional association rules mining algorithms are too low,how to improve the time efficiency of algorithm is the main research content of Association Rules Mining.This paper studies Association Rules Mining from two aspects,one is based on binary attributes,and the other is based on multi-valued attributes.One side,this paper thoroughly studies the Boolean association rules mining algorithm which is called Apriori algorithm,and a parallel counting improved algorithm based on Hash tree is proposed to overcome the shortcomings of large scale of candidate itemsets and slow counting process.The algorithm is improved from three aspects.Firstly,cuts frequent itemsets to reduce the size of candidate itemsets generated by joins.Secondly,uses Hash tree to store candidate 1-itemsets to accelerate the support counting process,starting from candidate 2-itemsets,improves the counting process of Apriori algorithm according to the characteristic that transactions and itemsets have been arranged in ascending order in dictionary.Thirdly,makes full use of the advantanges of multi-core CPU and rewrites the counting process with multi-threading technology to realize parallel counting.The performance of Apriori algorithm and improved algorithm is tested by experiments,and the results show that the time performance of the improved algorithm is greatly improved.One the other hand,this paper deeply studies the process of Apriori algorithm dealing with multi-valued attribute datasets,and points out the problem in this process which is too many invalid itemsets are generated.To solve this problem,an optimization algorithm is proposed to reduce the number of invalid itemsets.The algorithm eliminates the itemsets generated by connecting different attribute values of the same attribute,so the size of candidate itemsets is reduced.The experimental results show that the time efficiency of optimization algorithm is higher.
Keywords/Search Tags:Association Rules, Apriori algorithm, Hash tree, Parallel counting, Quantitative
PDF Full Text Request
Related items