Font Size: a A A

The Research Of Association Rules Mining Algorithm

Posted on:2014-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:X M GuoFull Text:PDF
GTID:2268330401977717Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, Enterprises must implement the digital construction,in order to improve the operation efficiency and market competitiveness of enterprises. The generation of data mining is to find useful information in the database. It is very useful to extract meaningful information from the scattered data. In order to find useful information in the database, Data mining technology has emerged. In the field of data mining, including the association rule, clustering and classification, etc. Among them, association rules is the foundation of other branches and is the most widely used. Mining association rules mainly including mining the frequent itemsets in a database and generate association rules, and the core part is mining the frequent itemsets in a databaseThe main idea of association rule mining is as follows. First of all, use of mining frequent episodes algorithm for frequent itemsets, and then generate association rules from frequent itemsets. Mining frequent itemsets is one of the research hotspot in association rule, and the most classical algorithm of mining frequent itemsets is the Apriori algorithm. The advantages of this algorithm is easy to understand and be able to dig out all the frequent itemsets, but there are also many shortcomings, mainly include:(1) the need to scan the database many times, due to the large I/O overhead;(2) generated by the number of candidate2-itemsets is too big;(3) connection and shearing process is too complicated.Based on Apriori algorithm and its improved algorithm research, we proposed load balancing of distributed parallel Apriori algorithm (DPApriori). In this paper, the main work is as follows:Firstly introduces the basic contents of data mining and association rules, and then detailed introduces the Apriori algorithm and its improved algorithm of mining frequent set. Based on Apriori algorithm and its improved algorithm, we put forward the load balance of distributed parallel Apriori algorithm (DPApriori). The main contents are as follows. Firstly, Change the corresponding relation of the transaction and the project in the transaction database; Second, optimization the process of operation through using some properties and theorems in the connection and shearing process. Finally, in the process of distributed processing, project is assigned to different processors according to the size of the weights, which is given in advance. Through the above steps, we can achieve good load balancing and improve the operation efficiency.At last, the algorithm has higher efficiency and can achieve a good balanced load by experimenting under many different conditions. Result shows that the presented DPApriori algorithm has greater efficiency and applicability.
Keywords/Search Tags:Data mining, Association rule mining, distributed andparallel computing, Apriori algorithm
PDF Full Text Request
Related items