Font Size: a A A

Research Of Association Rules Mining Algorithm Based On Tree

Posted on:2014-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2268330401974932Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and database technology, the dataneed to be stored in the information system are increasing with geometric series. Face to suchmassive data, the knowledge people get from it is very scarce, so people urge to dig out somehidden knowledge from these massive data, not like before, only need to manage data. Theseknowledge can play a guiding role in our future behavior, the process of dig knowledge is knownas data mining. The data mining compromises of varied disciplines, such as databases, artificialintelligence, pattern recognition, mathematical statistics and so on, and it has been widely used inthe pharmaceutical industry, telecommunications, manufacturing industry and retail. The datamining technology is divided into clustering, classification, mining of association rules and so on,one of the most widely used technology is association rules mining.Association rules mining is mining the relationship between the itemset in the database.Since the association rules mining issue was raised, scholars make a lot of research on it and putforward many association rules mining algorithm, one of the most famous algorithm is Apriorialgorithm. Apriori algorithm is a classical algorithm of association rules mining, it uses iterativemethod of the layer by layer to dig association rules that meet the minimum support and minimumconfidence in the database. This algorithm is divided into two parts: generate frequent itemsetsand generate association rules. The frequent itemsets’ generation also can be divided into two parts:generate candidate itemsets and confirm the frequent itemsets. Classical algorithm generated ahuge amount of candidate itemsets in the process of association rules mining, the join algorithmrequires a lot of judgment of the connection, each time to confirm the frequent itemsets must scanthe entire database, algorithm’s efficiency needs to be improved. So we propose an improvedalgorithm on the basis of the classical association rules algorithm, the improved algorithm’s mainmeasures are:1Uses tree structure to store data. The new store structure contents of the data length oftransaction, the position of the transaction that its length is first less of this transaction, theposition of the transaction that its length is first equal or greater than this transaction and the parent node’s position of this node. Through the length of transaction to reduce the scan time togenerate frequent itemsets each time.2Add a boolean domain. Carry on a further judgment of the transaction record that needto be scaned by means of statistics single item that must not in frequent K-sets.3In the process of generating frequent K-sets, the scanning of the transaction databasewould be started from the root node until find the first transaction that it’s length is greater than orequal to K. The search process reduce the efficiency of the algorithm, if we can find the first nodethat match the search criteria directly, the scan time can be reduced. we generate a table directlywhen the data are stored, the table record the position of the transaction that its length is firstappeared, the data in the table is sorted by the length of the transaction in ascending order.In addition, the algorithm uses a different method to generate candidate itemsets.Through experiments compared the running time of the new algorithm with the classicalalgorithm to prove proposal is more efficiency than traditional algorithm. At last we make asummary of the work that have been done and proposed the future research directions.
Keywords/Search Tags:data mining, association rules, Apriori, binary tree
PDF Full Text Request
Related items