Font Size: a A A

Research On Improving Apriori Algorithm For Mining Association Rules

Posted on:2004-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2168360125463290Subject:Computer applications
Abstract/Summary:PDF Full Text Request
The paper begins with the practical meaning of the AR (Association Rules). We fully discuss the necessary of the research in AR and talk about the important influence of AR in the society and the commerce. AR has spent 10-year research since it was put forward by Rakesh Agrawal and Ramakrishnan Skrikant and has become one of important branches in the Data Ming world.For the knowledge's relationship, we have a deep discussion on the KDD(Knowledge Discovery in Databases), Data Mining and Association Rules. They are the base for the further work.The highlight of the paper is the research of the improved classic Frequent Set Algorithm. After talking about the details in classic Frequent Set Algorithm (Apriori Algorithm), we focus on the two improved strategies and employ the JAVA OOP technique to achieve the details in the algorithm.On one side, we theoretically prove the method that reducing the Candidate Set (Ck) can be high-powered. On the other side, employ the Hash tree to store the frequent items, to achieve fast number count of the frequent items. First, theoretically prove how the Hash tree can be used in the new problems. Then, change the abstract theoretical problems into the details with the OO programming: from the structure of the Hash tree to the addition of the leaves and to the travel problem of the tree.To test our improved idea, we select two databases as the test bed. One is the database we build ourselves. Another is that we use the anonymous web data from www.microsoft.com as the real test data. After a proper change (For instance, delete the redundant data and regulate the interface between the test database and the algorithm program.), the anonymous web data fully meet what we need.Base on the different test bed, we use lots of different cases to test our improved algorithm. Besides the association rules, we also get a lot of important test data. For example, when the confidence is fixed, with the increasing of the support, we get series of different frequent item sets, association rules and run times. Through the discussion on these test results, we make a conclusion that the new algorithm is steady and convergent. Base on this conclusion, we also make a preparation between the former algorithm and the new one. We find that the new algorithm has more advantages.
Keywords/Search Tags:Association Rules, Apriori Algorithm, Frequent Set, Candidate Set
PDF Full Text Request
Related items