Font Size: a A A

Parallel Data Mining Theory Research And Application

Posted on:2007-09-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Q WangFull Text:PDF
GTID:1118360185988090Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Knowledge discovery in databases or data mining is the artificial analysis of large volumes of data, looking for the relationships and knowledge that are implicit in data warehousing and large volumes of data and are 'interesting' in the sense of impacting an organization's decision and practice. Data mining and knowledge discovery on large amounts of data can benefit of the use of parallel computation of cluster both to improve performance and quality of data analysis. In fact, mining large data sets require large computational resources because data mining algorithms working on very large data sets take very long times on conventional computers to get results. One approach to reduce response time is sampling. But, in some case reducing data might result in inaccurate models, in some other case is not useful (e.g. outliers identification). The other approach is parallel computing.High performance computers coupled with parallel data mining algorithms can offer the best way to mine very large data sets. Faster processing also means that users can experiment with more models to understand complex data. High performance makes it practical for users to analyze greater quantities of data. The parallel data mining will play a more and more important role for data analysis and knowledge extraction in several application contexts analysis of scientific data mining of commercial, industrial databases data extraction and decision support for departments.Although some parallel algorithms of data mining have been proposed nowadays, they are restricted by huge communication, poor expandability, unreasonable data distribution and other problems so that the efficiency of some parallel data mining algorithms rapidly decreases with the increasing of data size. Therefore, it is creative and necessary to study and propose innovative parallel algorithms for data mining for applications of commercial and industrial fields.Before the parallel data mining algorithms are been studied, the parallel computing environment and large data warehousing together with industrial production databases have been designed as the plateform of research and application. Based on the data warehousing the statistical analysis is executed, and based on the PC cluster the data are...
Keywords/Search Tags:cluster of workstation, parallel data mining, business data warehouse, parallel association rules, parallel cluster algorithm, parallel neural networks
PDF Full Text Request
Related items