Font Size: a A A

The Research Of Association Rules Mining Algorithm

Posted on:2013-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:2248330371490210Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the time develops rapidly and the technology of database gets mature gradually, the quantity of data produced has ballooned expoentially, so we are eager to get some rules or something valuable from the majority of data, which are also named as information, to guide us to develop practically in the future. Data mining, which could meet the requirement, is a novel technology of information analysis, which could extract the hidden, implicit, valid, novel and interesting spatial or non-spatial patterns or rules from large-amount, incomplete, noisy, fuzzy, random, and practical data by using computer.At present, data mining mainly involves association rules, clustering, classification, time-series patterns discovery and so on. Association rules mining, as one of the most important branches of data mining, has a wide application in various aspects. It can discover relationship between itemsets according to analyzing database. Association rules mining was proposed originally by Agrawal, and put forward Apriori algorithm after improving the AIS algorithm. Apriori algorithm is the most classical association rules mining algorithm. It uses the iteration method for searching in each layer and finds the rules which could meet the threshold of minimum support and minimum confidence. There are two steps in Apriori algorithm:joining.and pruning. However,there are several drawbacks in Apriori algorithm:1. Ck produced in each stage is too large, especially when k=2,the quantity of candidate datasets is too huge.2. The database is scanned once a candidate set is generated, which causes the huge expense for I/O and increases the time complexity of the algorithm.After comparing with the algorithms, an improved association rules mining algorithm is proposed in this paper. Its thought described as below:1. The proposed algorithm adopts the data structure of matrix. The original transaction database is transformed into0-1matrix, which not only saves the storage space but also could do dataming without original database. It can reduce expense for I/O by scanning database only through adopting matrix.2.2-itemsets are obtained from matrix calculation by adopting the original matrix and its transposed matrix. The matrix calculation is simple and increases the search speed of itemsets.3. According to the data analysis of upper triangular matrix of the results, the number of candidate data itemsets is reduced, frequent itemsets are identified more rapidly, and association rules are obtained quickly.Through the experimental comparison and analysis, this algorithm has a lower time complexity and high running efficiency. Because of the quite low memory for matrix, the proposed algorithm is applied to association rules mining of large datasets in supermarket and finds out the relationship between products, which could provide scientific basis for decision-making of marketing in the future. Therefore, the proposed algorithm has good practicality and applicability.
Keywords/Search Tags:data mining, association rules mining, Apriori algorithm, matrix, matrix operations
PDF Full Text Request
Related items