Font Size: a A A

Association Rule Mining

Posted on:2007-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:X M ZhangFull Text:PDF
GTID:2178360185475639Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Association rule mining is an import part of data mining. Frequent itemset mining is a very important phase of discovering association rules. To some degree, it decides the efficiency of association rule mining algorithm. It also plays an important role in other data mining tasks such as strong rules, correlation and sequence pattern. Studies in this paper are made within the framework of support and confidence. Hundreds of algorithms of frequent itemset mining are systematized and classified as four types. An overview on the famous algorithm Apriori and FP-growth is given; their advantages and weakness are pointed out. In association rule mining transaction databases have two common formats, horizontal layout and vertical layout. Normally, algorithm using vertical database layout is often superior to those using horizontal layout. Two famous algorithms Eclat and Diffset were focused and studied. A new algorithm Eclat~+ was brought out, which is an improvement of Eclat and show good performance with a lot of datasets. We compare the two vertical formats, Tidset and Diffset. For sparse datasets, the size of database in Tidset format is much smaller than the one in Diffset format. A transformation of Diffset algorithm is provided after careful study. When dealing with sparse datasets, this transformation can save a lot of memory. Then a new effective hybrid algorithm based Diffset is brought out, which shows good performance with both sparse datasets and dense datasets. In order to improve the scalability of these algorithms, projected database is introduced.
Keywords/Search Tags:Data Mining, Association Rule, Frequent Itemset Mining, Eclat, Diffset
PDF Full Text Request
Related items