Font Size: a A A

The Improvement And Research For Association Rule Mining Aigorithm Based On Compressed Matrix

Posted on:2014-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:D LuoFull Text:PDF
GTID:2268330401985830Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, the data and information are growing explosively. In order to extract the useful information from big data, data mining becomes the most active field in current database research. Association rule mining is an important research direction in data mining field. As this technology has been widely used in many kinds of areas, it has extremly significant application value. With the growth of the data sets in the size and complexity, how to improve the mining efficiency of association rule mining algorithm for large-scale data sets becomes the key problem of the research on association rule mining. The classical association rule mining algorithm is Apriori algorithm. In the current Apriori is a hot topic in the study of association rule mining algorithm. Compared with Apriori algorithm, the Apriori algorithm based on matrix reduced the number of scanning the database and improved the efficiency of computing support count. But it still has the problems that the candidate itemsets is too large and matrix takes up too much memory space. After studying on the existing Apriori algorithm based on matrix, an improved Apriori algorithm based on compressed matrix called NCMA is proposed In this paper. The main work includes the following aspects:Firstly, the background and the development status of association rule mining algorithm are studied. Then focus on discussing the advantages and disadvantages of Apriori algorithm and its improved algorithms. After that, the analysis of the problems which the existing improved Apriori algorithms have is given. Secondely, The Apriori algorithm based on matrix and its improved algorithms are analyzed in detail and find that the improved algorithms still exist these problems:The number of sanning matrix is big. Reducing the number of candidate itemsets results in adding extra computing time. Matrix is not compressed thoroughly enough. The accuracy of the mining result is low. Algorithm design is too complex.Thirdly, Aiming at the deficiency, an improved Apriori algorithm based on compressed matrix called NCMA is proposed. The algorithm is improved from matrix storing, itemsets sorting, matrix compressing, support count computing and the condition of stopping algorithm that five aspects. Then, use a example to analyze and prove the correctness of this algorithm.Finally, analyze and compare NCMA algorithm with Apriori algorithm and CM Apriori1algorithm theoretically and experimentally. Theory and experiment results all prove that NCMA algorithm can reduce the number of scanning the matrix, compress the scale of matrix greatly, reduce the number of candidate itemsets and raise the efficiency of mining frequent itemsets. The algorithm has better operation efficiency and scalability than existing Apriori algorithms based on compressed matrix when mining dense databases.
Keywords/Search Tags:Data mining, Association rules, Apriori algorithm, Compressed matrix
PDF Full Text Request
Related items