Font Size: a A A

Grid-based And Information Entropy-based Clustering Algorithm

Posted on:2012-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhouFull Text:PDF
GTID:2248330371463499Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Data mining is the process which to extract the information, rules, or knowledge which interesting but not for people to know, from the large number of noisy and irregular data. In the field of data mining, clustering has been widely applied in pattern recognition, image processing and market research,Therefore, the research on clustering has important theoretical and practical significance。In the cluster analysis,although many existing clustering algorithm can find the arbitrary shape and different size clusters, but it is difficult to obtain satisfactory results for multi-density data set. In order to improve the quality and efficiency of clustering algorithm, the following aspects were studied in this article:(1) First, a brief introduction to some basic knowledge of data mining, while it is necessary to do a simple introduction to some of the data preprocessing which is used in the process of data mining. Finally analyzes some common characteristics of clustering algorithms and contrast their advantages and disadvantages.(2) grid-based clustering analysis, the results of clustering has a significant impact by the method of meshing; this article gives a more systematic exposition to the meshing methods and describes the characteristics of different meshing methods. Conducting cluster analysis, the grid boundary points on the precision of clustering also has a certain impact, so the article has analyzed several different treatments of the grid boundary points and proposes a new approach to the grid boundary points.(3) The article, based on optimization of information entropy, proposed a grid-based and information entropy-based clustering algorithm for multi-density, which according to grids of different densities carried by the information entropy, automatically calculate the density threshold to identify multi-density data set in different classes. Experiments show that the algorithm can effectively place noise and found the multi-density class, with better clustering results.Finally, the proposed grid-based and information entropy-based clustering algorithm for multi-density is applied to image segmentation, the experiments show that this clustering algorithm for image segmentation is effective.
Keywords/Search Tags:data mining, cluster analysis, grid clustering, density clustering, information entropy
PDF Full Text Request
Related items