Divide And Conquer Strategy In Association Rule Mining

Posted on:2003-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:Z M Lin

Full Text:PDF

GTID:2208360092471224

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

Mining association rules is one of the most important issues in the field of data mining. Its task is to discover the relations between item sets from a large transaction database. In this way,we can study purchasing habits of customers and learn how to design catalog,sale strategy,etc.To mine association rules,we mine all frequent itemsets first. It is then straightforward to generate the rules from the discovered frequent itemsets. However,it is time-consuming to mine frequent itemsets since it is necessary to scan a generally very huge database. How to design an efficient and scalable algorithm for mining rules has become one of the hot topics. Among all the proposed techniques,algorithm Apriori is relatively efficient and effective.Algorithm Apriori employs iterative technique:initially it computes all frequent 1-itemsets and then generates iteratively frequent (k+l)-itemsets Lk+1 from frequent k-itemsets Lk until set Lk is empty. The second procedure is based on generation-and-test method:first generate candidate itemsets Ck+1 from frequent itemsets Lk and cut off some candidates which are sure to be infrequent,then scan the database to count the support for every remaining candidate. We can find that most execution time is spent in these two procedures:generation and test.On the basis of Algorithm Apriori,this dissertation presents a new algorithm DCM using divide-and-conquer technique. According to some strategy,DCM divides frequent itemsets Lk into three non-intersect subsets Thus the generation of most candidates can be localized in each individual subset and some unnecessary interactions between subsets are avoided;moreover,most of the cut-off actions are also localized in each subset and we don't need to scan the whole set Lk as before. To realize this target,new data structure is introduced for some frequent itemsets:left tag and right tag. The computation of tags is straightforward while the tags turn out to be effective.Since association rules are usually mined among a very large database,it is necessary to develop effective algorithms to manage,maintain and update the rules. This dissertation also handles the problem of updating when the minimum support varies but the whole database remains constant. We apply the divide-and-conquer strategy to this issue and develop algorithm DCIUA. It not only adopts the merits of divide-and-conquer method but also fully utilizes information of the original frequent itemsets under the old minumum support. So DCIUA is more efficient than the direct application of Apriori under the new minimum support.

Keywords/Search Tags:

association rules, frequent itemset, divide-and-conquer, incremental-update, tag

PDF Full Text Request

Related items

1	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
2	Research On Algorithm Of Mining Association Rules Based On Matrix
3	Research Of Incremental Updating Algorithm For Association Rules
4	Studies On Several Problems Of Data Minging
5	Research On Parallel Frequent Itemset Mining Algorithm Based On MapReduce
6	Research And Application Of An Improved Algorithm For Association Rules
7	The Research On Distributed Incremental Update Method Of Frequent Item Set Based On Flink
8	The Research Of Incremental Update Association Rule Mining Methods
9	Research On The Basic Technology Of Association Rules
10	Research On Distributed Treatment Of Concept Lattices And Knowledge Discovery Based On Its Framework