Font Size: a A A

Efficient Frequent Item Set Discovery Methods And Improved Apriori

Posted on:2012-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:S C ChangFull Text:PDF
GTID:2178330338494773Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the information technology, the producing and storing of data has reached an all-time stage. How to distill the usefulness and potential knowledge is a grim challenge to traditional data processing, and data mining emerge as time require to face the problem present above. With the appearance of data mining, how to increase the usefulness efficiency and data mining's efficiency has become the kernel problem. And association rules'mining as one of the most main measures to data mining, how to enhance the efficency of the associaton rule mining and effectiveness of the association rules become one of the research hots spot in recent years.On this thesis, via the research and analysis the two sides of the Apriori algorithm, that is frequent itemset and association rule generated. That means there are two aspects we can delve into improve the efficiency of the apriori agorithm. The first is there will be quite a lot of redundancy when produce candidate itemset especially 2-candidate itemset and need to scan database several times, the problems presented above are the algorithm's main bottle-neck in generating association rules. The second is creating quite a lot of redundancy even uninteresting rules, that will confuse even mislead customers in the process of judemment. Considering the above proposed shortcomings, this essay bring out a new measure to produce frequent itemset, and just need scan the database once to cope with those shortcoming. The new algorithm will statistic all the probably two-itemset combination. Finally, accord to the support -value to filter those unfrequent two-itemset, and needn't produce 2-candidate itemset and thus to enhance efficiency of finding frequent itemset. To solve the redundancy of the rule, we bring the third measurement-correlation support measure, by the correlation support measure to eliminate redundancy by some degree, and by the two educe property of association rule to advance efficiency. But, due to induce the third measure, those redundant association rule that have been excluded by the two property of association rule, have to judge correlation support measure to meet the threshold of correlation support measure. In view this, by math deduce, I get another two property to eliminate those redundant rules we don't judege their threshold of correlation support measure and in chapter 3 I give the comparison of the improved and original agorithm by experiment in spending time.At last, I choose proper confidence level, support level and correlation support measure and avail the high-effective association rule algorithm to mining part of the log of Guangdong Light Industry Vocational Technical College. And analyze the data mining result carefully to put forward some improving opinions of the website.
Keywords/Search Tags:Kfrequent itemset, candidate itemset, correlation support measure, association rule
PDF Full Text Request
Related items