Efficient Frequent Item Set Discovery Methods And Improved Apriori

Posted on:2012-07-11

Degree:Master

Type:Thesis

Country:China

Candidate:S C Chang

Full Text:PDF

GTID:2178330338494773

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the information technology, the producing and storing of data has reached an all-time stage. How to distill the usefulness and potential knowledge is a grim challenge to traditional data processing, and data mining emerge as time require to face the problem present above. With the appearance of data mining, how to increase the usefulness efficiency and data mining's efficiency has become the kernel problem. And association rules'mining as one of the most main measures to data mining, how to enhance the efficency of the associaton rule mining and effectiveness of the association rules become one of the research hots spot in recent years.On this thesis, via the research and analysis the two sides of the Apriori algorithm, that is frequent itemset and association rule generated. That means there are two aspects we can delve into improve the efficiency of the apriori agorithm. The first is there will be quite a lot of redundancy when produce candidate itemset especially 2-candidate itemset and need to scan database several times, the problems presented above are the algorithm's main bottle-neck in generating association rules. The second is creating quite a lot of redundancy even uninteresting rules, that will confuse even mislead customers in the process of judemment. Considering the above proposed shortcomings, this essay bring out a new measure to produce frequent itemset, and just need scan the database once to cope with those shortcoming. The new algorithm will statistic all the probably two-itemset combination. Finally, accord to the support -value to filter those unfrequent two-itemset, and needn't produce 2-candidate itemset and thus to enhance efficiency of finding frequent itemset. To solve the redundancy of the rule, we bring the third measurement-correlation support measure, by the correlation support measure to eliminate redundancy by some degree, and by the two educe property of association rule to advance efficiency. But, due to induce the third measure, those redundant association rule that have been excluded by the two property of association rule, have to judge correlation support measure to meet the threshold of correlation support measure. In view this, by math deduce, I get another two property to eliminate those redundant rules we don't judege their threshold of correlation support measure and in chapter 3 I give the comparison of the improved and original agorithm by experiment in spending time.At last, I choose proper confidence level, support level and correlation support measure and avail the high-effective association rule algorithm to mining part of the log of Guangdong Light Industry Vocational Technical College. And analyze the data mining result carefully to put forward some improving opinions of the website.

Keywords/Search Tags:

Kfrequent itemset, candidate itemset, correlation support measure, association rule

PDF Full Text Request

Related items

1	A Study Of The Association Rule Mining EARM Algorithm And It's Application
2	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
3	Research On Expanded Models Of Association Rules
4	Data Mining Technique Application Study On Logistics System
5	Research And Application On Association Rule Mining
6	Itemset Distribution Mining And Its Applications In Pattern Analysis
7	Fp-tree-based Association Rule Mining Algorithm Design And Implementation
8	Research On Distributed Treatment Of Concept Lattices And Knowledge Discovery Based On Its Framework
9	Research And Realization Of Parallel Algorithm For Mining Frequent Closed Itemsets
10	Mining Algorithm Based On Association Rules Of Logic And Computing