Font Size: a A A

Research On Multi-dimensional Association Rules Mining In Distributed Environments Based On Advanced Sql Query

Posted on:2011-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2198330332469421Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Multi-dimensional association rule mining is an important task of data mining. Meanwhile, with the rapid development of Internet, distributed database has been become a broadly used environment. Therefore, there is an urgent need for a method to solve the problem of multi-dimensional association rule mining in distributed database system.The author proposes the MDMA(Multi-dimensional Distributed Mining Association rules) algorithm based on advanced SQL query in this paper. The algorithm works on topological structure of star-style network. The structure consists of central site and local sites. Central site controls the process of mining and shows the results of mining. Local site mines out local frequent itemsets and calculates local support count for every global candidate frequent itemset. The algorithm uses CUBE operator in new standard SQL to mine out local frequent itemsets. It needs only one times of scanning the database to generate all the local frequent itemsets.There is no need for it to conduct a great deal of iterations to generate frequent itemsets. The algorithm makes use of two times of knowledge fusion to mine out distributed frequent pattern. First of all, global candidate frequent itemsets are choosen out from local frequent itemsets. In addition, central site uses global candidate frequent itemsets to establish global expanded frequent pattern tree and the tree is sent to every local site. After every local site receives the tree, they will calculate local support count for every global candidate frequent itemset and the results are sent to central site. Then central site adds up all the results and picks out global frequent itemsets. Therefore, no matter what the number of the site and the scale of the local database is, the algorithm always needs only two times of scanning the database and only three times of network communications. In order to effectively carring out knowledge fusion, a new kind of data structure is created. It is called global expanded frequent pattern tree. A kind of compound node is introduced into the tree. Compound node contains some meta-nodes whose relationship is logical or. Accordingly, the process of traversing the tree is simplified in the process of validating multi-dimentional global frequent pattern. What is more, it enhances the degree of the visualization of the results of data mining. MDMA algorithm takes user preference into account. The user can freely choose which attributes to be mined. Consequently, it has the merits of light network traffic, low time cost, more simplicity, better scalability and considering user preference.This paper developed the distributed muiti-dimentional association rules mining system based on WEB. The system can visually show the results of data mining. Moreover, according to the antecedent and the consequent which the user sets, the system is able to interactively generate relevant association rules.
Keywords/Search Tags:Distributed database, Multi-dimensional association rule, CUBE
PDF Full Text Request
Related items