Font Size: a A A

Research On A Parallel Architecture For Data Mining

Posted on:2004-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y FengFull Text:PDF
GTID:2168360095456635Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Data Mining is recently core technologies for an enterprise to analyze large data-sets, and it is a key step in knowledge discovery process and a database technical further expanding. The efficiency problem of Data Mining has been bottleneck to make the roughly Data Mining Technical development. Parallel Computing Technical developed a kind of efficient path for solution of this problem. The thesis analyzed the differentia of Parallel Computing Architectures and the differentia of parallel software environments for Parallel Computing Architectures, and put forward a kind of parallel architecture for Data Mining based on PVM. The research of the thesis has the certain theories meaning with the practical value.The thesis chose SMP, MPP, DSM, COW four kinds of Parallel Computing Architectures as the research objects, and analyzed their architectures with the system characteristics. Among them made the point research to COW, and proceeded the detailed comparison to these four kinds of Parallel Computing Architectures from node scale, node complexity, communication method of nodes, task assignment, SSI support, OS of nodes, address space, security of nodes, ownership, network protocol, availability, standard of capability measurement, and complexity of design ten aspects. According to COW is in usability, availability, repeatability, scalability, and the capability/price five advantages, choosing COW as foundation of the parallel architecture for Data Mining. The thesis analyzed the work principle of two kinds of the most popular Parallel Computing Software (PVM and MPI) with the system mechanism, and preceded the detailed comparison with PVM to MPI from design thought, system support, portability, process control, resource control, fault tolerance, context for safe communication, the way of communication, name service, and message handlers ten important aspects. According to design thought for Virtual Machine, good portability, support for heterogeneity architecture, good scalability, resource control effectively, process control of multilevel, fault tolerance mechanism effectively, support for name service, support for compute-model, close combination with UNIX, and architecture of tightly packed, choosing PVM as software environments of parallel architecture for Data Mining.Combination PVM, COW with Linux three of advantage, the thesis put forwarda kind of parallel architecture for Data Mining. The thesis analyzed characteristics of the parallel architecture for Data Mining from scalable architecture, commercial conjunction construction, standard operation environments, high performance service, standard programming model, architecture availability, and SSI seven aspects. The architecture has good scalability, reducing cost of network equipment, supplying standard operation environment and high-powered service group, supporting sequential compute model and many kinds of parallel compute model, having good availability, supporting virtual SSI. For the sake of quantificationally of analysis with evaluate the parallel architecture, the thesis choosing Stage Parallel Model, according to three stages of Stage Parallel Model (parallel stage, compute stage, and alternation stage) and capability index of the three stages, with experiment data, analyzed the function of the parallel architecture for Data Mining, spending of parallel and spending of Point-to-Point communication are lower, spending of group communication and spending of group computation are acceptable.In the last, the thesis aim at the characteristics about Association Rules Mining, Classification Mining and Clustering Mining, and put forward parallel strategy based on the parallel architecture for Data Mining, which explains suitability of the parallel architecture for Data Mining more...
Keywords/Search Tags:Data Mining, Parallel Architecture, Cluster of Workstations, Parallel Virtual Machine
PDF Full Text Request
Related items