Font Size: a A A

Research Of Parallel Algorithm For Mining Association Rules Based On Data Partition

Posted on:2008-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:G M CaiFull Text:PDF
GTID:2178360212996012Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the society coming into the information period, and the comprehensive application of the computer network and computer technology, the database in every industry accumulates substantive data increasingly. How to use these data and pick up useful information and knowledge from them to guide the production and distribution of the enterprises comes into being and develops a new computer technology—Data Mining Technology which is widely used and has tremendous practicality.Data mining is an emerging field, whose goal is to extract implicit, previously unknown, and potentially useful information out of large amounts of collected data.Data Mining (also called Knowledge Discovery in Databases) is a technique that aims to extract implicit, previously unknown, and potentially useful information out of large amounts of data sets. It has been viewed as an important evolution in information technology. During the past over decade, new concepts and techniques on data mining have been presented, and especially in the latest few years, some of them have been studied in higher levels.The proto typical application is the analysis of supermarket sales or basket data .The association rule discovery task identifies the group of items most often purchased along with another group of items, and predict the latent relation between the items, which are purchased always together so that the supermarket can make decision correctly.The association rule problem was originally proposed in the context of supermarket data to study the relationships of the buying patterns of customers in transaction data, i.e. to find how the items bought in a consumer basket related to each other.Data Mining system can find out lots of patterns, in which association rules that describe the interesting relations among the items in given data sets are the important area.The association rule problem was originally proposedin the context of supermarket data to study the relationships of the buying patterns of customers in transaction data, i.e. to find how the items bought in a consumer basket related to each other, so that the supermarket can make decision correctly.The problem of mining association rules is to generate all association rules satisfying thresholds, i.e. support and confidence are not less than the user-specified minimum support and minimum confidence respectively. It can be devided into two sub-steps. 1, Find all sets of items whose support is greater tan or equal to the user speciied. Itemsets with minimum support are called frequent itemsets. 2, Use the frequent itemsets to generate the desired rules whose confidence is greater than or equal to the user specifield. An association rule is a rule of the form X ? Y, where X and Y are sets of items X and Y are called respec- tivelythe body and the head of the rule. Where A and B are both conjunctions of conditionsDiscovering the frequent itemsets is the key technique and stage in association mining task. One sub problem of association rules mining is to finding all frequent itemsets from data sets with a certain user-specified minimum support. The problem of identifying all frequent itemsets is the computationally intensive step in the algorithm. Much of the research has been focussed on the first sub problem as the database is very large, and in this part the database is accessed. Parallel computating is a straight forward way deal with the large scale data sets, and frequent closed itemsets have all informathion of the frequent itemsets's, so in this paper is focussed on parallel algorithm for mining the frequent closed itemsets. The contents of this paper are as follows:Chapter 1 gives the introduction of the concepts and methods of data mining and Knowledge discovery, as the overview of the mission on dataming.Chapter 2 gives the describtion of the definition and procedure of mining association rules, and details the basic principle of mining association rules, then introduces two serial algorithm Apriori and A-Close in detail. In the meantime, investigates the theory of frequent closed itemsets. At last, has a brief review of other serial algorithm for mining association rules.Chapter 3 and 4 are the main point of this paper. After briefly introducing parallel computing Knowledge, three classical parallel algorithms (CD, DD, CaD) are described, as well as other improved algorithms. Based on the research of classical parallel algorithms, efficiency improvement meantimes are brought up. The improvements are as follows:(1) Partition the candidate itemsets. This will help reduce the time of searching as well as the searching space of every node.(2) Partition the transaction databases to meet the candidate Itemsets. This will help reduce the size of database as well as the size of transaction, because the partitioned database is based on the candidate items. This is another way of improving the efficiency of the algorithm.(3) After partitioning the candidate Itemsets and database, the discovering of frequent itemsets can be carried out independently on every node and, thus, the interprocessor communication and synchronization can be minimized.At last, the implement of the algorithm with Message Passing Interface (MPI), comparing with the classic algorithm, proves that it is more efficient. Chapter 5 summarizes the paper, and points out the direction for further research:(1) Parallel algorithm partitions the data, reduce the communication between node, and then makes efficient. But, still use serial algorithm for discovering on each node. Research on serial algorithm is the further work.(2) The aim of association mining is find out rules users interested in. Therefore, to provide users with a convenient user interface is a direction that should be study on.(3) The present study is carried out independently, and later, will be integrated with existing data mining system.
Keywords/Search Tags:Association
PDF Full Text Request
Related items