Font Size: a A A

Research On Association Rule Mining Algorithm Based On Bitmap

Posted on:2016-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:G B ZhaoFull Text:PDF
GTID:2208330470970533Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the development of computer technology and mobile Internet and enterprise, the accelerating of government and other institutions modernization pace, more and more data is collected and stored. Before the emergence of data mining technology, data owners can only get some simple surface information from the data collected, and can not find the hidden and more valuable rules, knowledge, information from the data collected. With the emergence and development of modern data mining techniques, these institutions excavated a lot of instructive information. But with the rise of cloud computing and big data, the defects of traditional data mining technology has become increasingly prominent. Association rules analysis is a very important data mining technology, so the research is very meaningful.Whether rapid, accurate mining frequent item set is the core of association rules analysis. The traditional association rule mining algorithms (such as the classic Apriori,etc.) exist the problem of scanning the entire transaction database more often and consuming long time when mining frequent itemsets. Aiming the above-mentioned shortcomings of traditional algorithm, this paper presents an effective bit tables based algorithm BITXOR for mining frequent item sets in static Boolean database. Firstly, the algorithm divides the transaction database into bit table, so that the items, itemsets and transactions in the transaction database can be represented as binary sequence, and the algorithm could simply scan bit table column and operate binary sequence in the follow-up work, rather than repeatedly scan the entire transaction database table or entire bit table. Then, the algorithm determines whether the frequent itemsets can be connected to the initial high number candidate by judging the number of one in the results of XOR between the binary sequences which represent item set, and connects the two frequent itemsets through the OR calculation to generate the initial high number candidate. Finally, algorithm deletes duplicate initial candidate set, and deletes the initial candidate itemsets which contains non frequent item subset.This paper respectively compared the proposed BITXOR algorithm with Apriori algorithm, FP-growth algorithm in mushroom, pumsb_star, T40I10D100K and T10I4D100K through MATLAB, and compared and analyzed the algorithm running time and the number of candidate sets two index. The simulation results show that, in the static Boolean database, compared with other two algorithms, the BITXOR algorithm only needs to scan the database once, the time of mining frequent itemsets significantly reduce, and the number of candidate set which algorithm generated is slightly less than Apriori algorithm. But the item set support calculation method of the algorithm is still not perfect, and the algorithm is only applicable to the static Boolean database, therefore, further research is necessary.
Keywords/Search Tags:Data mining, Association rules, Bit table, Frequent itemsets, BITXOR
PDF Full Text Request
Related items