Font Size: a A A

Large-scale Scientific Data Mining Density Clustering Algorithm

Posted on:2008-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:X Y XiongFull Text:PDF
GTID:2208360212999631Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data Mining, also known as Knowledge Discovery in Database, distills knowledge from a mass of data. It is a new research area involving several branches of machine learning and containing many domains. Cluster Analysis is one of the most important domains among those of data mining, which researching logical or physical mutual relation between data, and divide data sets into several clusters through certain rules, these clusters constitute by data points which similar in nature.In the beginning of this thesis some basic principal theories, approaches and problems of data mining are introduced, followed by conceptions, categories and general thoughts of popular algorithms about Cluster Analysis. A few classic cluster algorithms are deeply discussed.The main object of research in this paper is the density-based clustering algorithm. Research shows that most of density-based clustering algorithm needs input initial parameters, which are usually based on the user experience, this is difficult. In addition, density parameters are usually simply divides clusters into high-density clusters and low-density clusters, so they cannot reflect the overall data distribution. In order to solve such problems faced by these algorithms, this paper presents a self-Adaptive clustering algorithm based on density and gridding. First, it builds grids for data, then analyzes the density distribution of these grids, thus gets a series of density intervals self-Adaptively, these intervals are used as clustering parameters, and they are no longer a simple boundary between high density and low density clusters, but reflects the distribution of data.The combination of data mining and scientific research is a relatively new subject, and it is worth researching in many aspects. Large-scale scientific data have its unique characteristics, such as huge data quantities and complicated features, which usually makes it difficult to understand, analyze and extract knowledge from them. Thus, scientific data mining is imperative under the situation.In this paper, the science data-mining project and scientific simulation data are presented, and the self-adaptive density-based and gridding-based clustering algorithm is used for the scientific simulation data. Then the paper analyses and extract the clustering features of clustering results, these clustering features indicated the overall physical process of the data. In the last part of the thesis, the conclusion and prospect of data mining research and application is given.
Keywords/Search Tags:Data Mining, Scientific Data, Cluster Analysis, Density-based, Grid-based
PDF Full Text Request
Related items