Font Size: a A A

Research On Updated Algorithm Of Parallel Association Rules

Posted on:2008-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2178360242456654Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development in database technology and the general application in database management system, the data that people accumulated is more and more, how to make use of these data information completely for providing decision for the enterprise decision-maker becomes to be an exigent and tough problem, so data mining develops quickly for satisfying this request. Data mining technology is used to help people finding the information and knowledge in the mass data. It has become to be the core technology of intelligence commerce recently, it has been widely used in many areas and drawn the attention of the whole academe, how to improve the efficiency of data mining has become a popular issue for the academic circle to probe into. At present, association rule, one of the most successful and crucial discoveries in data mining, has been an active research area, and the most famous alogrithm of mining association rule is Apriori alogrithm.In this paper, we summarize the major concept and recent development of data mining and association rule, then we give a formal problem description of mining association rules. We analyze the performance of typical serial algorithms and parallel algorithm, and introduce their clous virtues and disadvantages. At last, we improve the primitive algorithm because of its repeated scanning and redundant storage.The purpose of CD algorithm is to reduce communication number and get favourable task distrubution, so every processor can only process local data parallelly, but this algorithom's I/O is ponderous and the data structure is repeated, it cannot use the whole CPU effectively. So an improved alogrithm NCD that bases on CD alogrithm is proposed in this paper: it can take the method that count the number of elements in candidate sets to reduce the combination of producing candidate sets and the the number of scaning database. The way is to cumputer the candidate sets S' parallelly with several processors, because it can not guarantee the candidate sest S' is a super set, it is possible to report the sets are disabled, so it still need to scan one time again even more until it is no longer report the invalidation. This algorithm can make every process computering local item-sets independently without knowing any information of the other processors, it just starts to exchange data until all processes computer the local item-sets, then increase or delet item-sets and get the result, the way can enhance the mining speed and decrease time of the I/O operation. In fact CD algorithm uses a simple principle that is it promises the other processors redundant calculate and redundant store parallelly, thus it can avoid a large number of communication. Experiments and tests have been carried out based on NCD algorithm and CD algorithm.The paper has several experiments for CD algorithm and NCD algorithm based on test datasets and the result indicates that the efficiency of NCD algorithm has been greatly improved in the same data set.
Keywords/Search Tags:Association Rule, Parallel Data Mining, Data Mining, Database
PDF Full Text Request
Related items