Font Size: a A A

Research Of Key Techniques In Cluster Analysis

Posted on:2006-07-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:X B YangFull Text:PDF
GTID:1118360182957618Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Knowledge Discovery in Database (KDD) is a special process involving several steps, which include of data cleansing, data mining, and knowledge presentation, to extract valid, novel, useful potentially and understandable ultimately knowledge or patterns from database by using of learning algorithm. It is an iterative process of man-machine conversation. Data mining is the essential step of KDD, where intelligent methods are applied in order to extract data patterns, explain and visualize data mining results by knowledge representation techniques, such as trees, tables, rules, graphs.Cluster analysis is one of the most important functions, and clustering is the process of grouping a set of physical or abstract objects into classes of similar objects. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. This thesis focuses on key techniques and algorithms of cluster analysis.Chapter one reviews the content of data mining field firstly. Then the birth, evolution of data mining and the main functions of data mining are discussed, include concept or class description, classification analysis, clustering analysis, association analysis, sequence analysis and time sequence, outlier analysis. Finally, the main content and architecture of the paper is also described.Chapter two introduces the definition of cluster analysis, basic requirements of cluster algorithm, and the main types of data in clustering. Then it discusses various algorithms: partitioning method, hierarchical method, density-based method, grid-based method, model-based method. The applications of cluster analysis are discussed lastly.Chapter three provides an introduction to the fuzzy set theory, which consists of the concept and operation of fuzzy set, fuzzy cut and decomposition theorem. Then fuzzy cluster with the correlative algorithms are studied, and the example of FCM algorithm explains the application of fuzzy cluster.Chapter four investigates cluster algorithms of Gaussian mixture model. Besidesof classical EM algorithm, the Gaussian Mixture Density Decomposition (GMDD) algorithm is also studied. In some fields, people depend on the sample empirical distribution and chosen model distribution, some applications can be implemented more exact, which involve a weighted function. Based on the GMDD algorithm and Weighted Likelihood Equations (WLEs), a new algorithm, called weighted GMDD algorithm is proposed. After studied carefully GMDD and weighted GMDD algorithm, it is found that there are still some special case in which GMDD and weighted GMDD algorithm is difficult to converge or can not gain a valid Gaussian component. A new technique is put forward based on partitioning method. It solves excellently the problems of GMDD caused by the symmetry, and consequently makes the applications of GMDD algorithm more extensive. A simple simulation experiment will illustrate the validity of the method.Chapter five makes researches on cluster algorithms of switching regression model. First, Hard C-Partitioning algorithm and Fuzzy C-Regression Models (FCRM) algorithm are reviewed. The characteristic of both algorithms is discussed. Then the fuzzy threshold is defined, which enactments fuzzy quality of data by comparing fuzzy degree with fuzzy threshold, and thus Hard C-Partition algorithm and FCRM are unified organically. The experiment shows that the introduction of fuzzy threshold enhances the efficiency of clustering on switching regression model. In addition, two methods to solve the problem of noise in switching regression model are proposed. One is based on existing clustering algorithm, and it avoids noise by reviseing the clustering results continually; the other is base on influence function, and it clears up noise by reviseing the membership degree of each data. The effect of the two methods has been validated by experiments.Chapter six makes a conclusion of the research and puts forward the future research in this field.
Keywords/Search Tags:Data mining, cluster analysis, fuzzy clustering, Gaussian mixture model, switching regression model, noise
PDF Full Text Request
Related items