Font Size: a A A

The Improvement And Research For Array-based Association Rule Mining Algorithm

Posted on:2009-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y M ZhangFull Text:PDF
GTID:2178360245465509Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining essentially aims at applications from the beginning. Data mining tools perform data analysis and may uncover important data patterns, contributing greatly to strategies. As an important technology of data mining, association rule mining has been applied greatly to various fields, especially business field. As the growth of the data sets' size and complexity, it is crucial for us to study the association rule mining algorithm to ensure system's high-performance and strong applicability to data sets.Association rule mining finds the relation among all the items or the attributes in the dataset. Association rule mining is a two-step process. Find all frequent itemsets at first. Then generate strong association rules from the frequent itemsets. The famous algorithm to find the frequent itemsets is Apriori. Later, many variations of these algorithms have been propsed that focus on improving various questions. The mining algorithm of association rule based on array adopts the property of array to enhance the efficiency of the mining algorithm.Concerning with some weakness in association rules mining, such as high cost of mode counting and inefficient I/O, this paper has analyzed Apriori algorithm in detail and made a lot research on the mining algorithm of association rule based on array.For the problems of the algorithm: a great deal of useless elements in the array and many candidate itemsets, the author proposes one improved algorithm. The algorithm uses data constraint and builds only the frequent itemsets which user is interested in. The algorithm can reduce mode-counting cost and improve mining quality. In fact the author makes some improvement in some ways about reducing the number of transactions in array and the improvement of join to make candidate itemsets with different length in every array scanning time. Then, the entire frequent itemsets are digged out in less scanning times. It is important and significant to enhance the efficiency and quality of association rules mining.With the base of all the research, the author designs the mining algorithm of association rule based on array and the improved algorithm system with DELPHI7.0 and SQL Server2000 as developing tools. It adopts 5000 test datas which IBM data generator produces. In the thesis, the author gives the flow chart and describes the running process of the system in detail, further more, the experiment results show that it's feasible and valuable of the improved algorithm. Finally, the author analyzes the problems to be continued to research and the next developing direction in this field.
Keywords/Search Tags:association rule, data mining, Apriori algorithm, frequent itemset, array
PDF Full Text Request
Related items