Font Size: a A A

Research On Parallel SLIQ Algorithm Based On PVM

Posted on:2004-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z XueFull Text:PDF
GTID:2168360095956628Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As a critical application of KDD (Knowledge Discovery in Database), Data mining is more and more widely used. Classification is an important part of Data Ming and application of CRM (Customer Relationship Management). SLIQ algorithm is a fast and scalable classification algorithm for data mining, which is brought forward by IBM Almaden Research Center in 1996. The typical application of SLIQ lies in CRM, credit ranking, etc in large business. Followed by the rapid extension of data size, the usage of parallel technology is a very important method to improve the efficiency of Data Ming.SLIQ uses novel pre-sorting and breadth-first techniques to build a decision tree fast and accurately on a large data set, and can deal both categorical and numeric attributes. But the primary algorithm contains the abundant computing on attribute and record. The paper bring forward the opinion that the record attached to leaf node and the attribute situated at the ancestor of present node ought to deleted dynamically, as can decrease unnecessary computing and IO exchange operating between local disk and memory.Because of the difficulty of solution of task division, the core ideal of the parallel SLIQ algorithm is the data parallellization. There exist two methods of data division: one is based on attribute, and the other on record. As the pre-sorting of attribute in SLIQ, the method based on attribute is adopted in this paper. The discussion of main parallel technologies on construction of parallel SLIQ algorithm is presented in this paper.The computing result of algorithm complexity of sequential and parallel algorithm indicates: when the data set is large enough, as to continuous attributes, the parallel algorithm almost get speedup value equal to the number of processors,While as to categorical attribute the improvement of parallel algorithm is limited...
Keywords/Search Tags:SLIQ, parallel, algorithm, PVM
PDF Full Text Request
Related items