Mining association rules is one of the most important issues in the field of data mining. Its task is to discover the relations between item sets from a large transaction database. In this way,we can study purchasing habits of customers and learn how to design catalog,sale strategy,etc.To mine association rules,we mine all frequent itemsets first. It is then straightforward to generate the rules from the discovered frequent itemsets. However,it is time-consuming to mine frequent itemsets since it is necessary to scan a generally very huge database. How to design an efficient and scalable algorithm for mining rules has become one of the hot topics. Among all the proposed techniques,algorithm Apriori is relatively efficient and effective.Algorithm Apriori employs iterative technique:initially it computes all frequent 1-itemsets and then generates iteratively frequent (k+l)-itemsets Lk+1 from frequent k-itemsets Lk until set Lk is empty. The second procedure is based on generation-and-test method:first generate candidate itemsets Ck+1 from frequent itemsets Lk and cut off some candidates which are sure to be infrequent,then scan the database to count the support for every remaining candidate. We can find that most execution time is spent in these two procedures:generation and test.On the basis of Algorithm Apriori,this dissertation presents a new algorithm DCM using divide-and-conquer technique. According to some strategy,DCM divides frequent itemsets Lk into three non-intersect subsets Thus the generation of most candidates can be localized in each individual subset and some unnecessary interactions between subsets are avoided;moreover,most of the cut-off actions are also localized in each subset and we don't need to scan the whole set Lk as before. To realize this target,new data structure is introduced for some frequent itemsets:left tag and right tag. The computation of tags is straightforward while the tags turn out to be effective.Since association rules are usually mined among a very large database,it is necessary to develop effective algorithms to manage,maintain and update the rules. This dissertation also handles the problem of updating when the minimum support varies but the whole database remains constant. We apply the divide-and-conquer strategy to this issue and develop algorithm DCIUA. It not only adopts the merits of divide-and-conquer method but also fully utilizes information of the original frequent itemsets under the old minumum support. So DCIUA is more efficient than the direct application of Apriori under the new minimum support. |