Font Size: a A A

Research On Clustering Algorithms In Data Mining Technology

Posted on:2009-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:P B ShiFull Text:PDF
GTID:2178360272456775Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the fast development in IT, the need of analysis and manage a tremendous amount of data becomes more and more instant, data mining technology is utilized to find the valuable knowledge and rules from these data.Data mining and its applications have already come into many disciplines and achieved plentiful fruits in diversified fields, including aritifical intelligence and machine learning, database,pattern recognition,bioinformatics,neural computing,and so on.As one of the important data mining tools, clustering technology gets more and more attentions,a mass of theories and methods have been achieved.With data mining technology used in various industries and the data becoming more and more complicated,a lot of new chagllenges lies in the research on data clustering.This dissertation systematically,deeply,roundly and detailedly studies and analyses the data mining technique,especially clustering analysis.The main contents are outlined as following:(1) An introduction of the development of data mining and the development direction of clustering analysis in data mining are concerned, then summarized the status quo both in china and abroad of the portioning method, hierarchical method, density-based and grid-based method and the other clustering methods. Finally, the main achievements and arrangements of the thesis are presented.(2) A brief introduction of the definition of clustering algorithm, comparability measurement, the classification and evaluation of the clustering algorithms are concerned. The main clustering algorithms and its basic principle used in data mining are introduced in detail. Finally, the methods for evaluating the clustering results are explained.(3) The basic ideas, algorithm processes and algorithm performance of traditional k-means algorithm are introduced in detailed. In traditional K-means algorithm, value k must be confirmed in advance, and this demand restricts a large number of practical applications. Initial centers are selected randomly and for this reason local extremums will be introduced. The common evaluate functions to the optimum number of clustering can not be very satisfactorily calculated. To conquer these problems, a new evaluation function-equalization function is introduced, meanwhile based on the density of the center initialization algorithm, the number of generation clustering are automatically calculated, Results of the experiment prove the efficiency of the improved k-means algorithm.(4) The common partition criterion used in spectral clustering, the basic framework and representative algorithms are introduced in detailed, meanwhile present the theory explain of spectral clustering algorithm. Through analyzing the essential of spectral clustering initalize sensitive, this paper introduced the k-harmonic means algorithm to conquer the shortcoming, a spectral clustering algorithm based on k-harmonic means is studied. Experiments show that it is an effective and feasible way for improving the performance of spectral clustering algorithm.In last, it makes a conclusion of the research and puts forward the future research in this field.
Keywords/Search Tags:clustering analysis, evaluation function, spectral clusering, initalize sensitive
PDF Full Text Request
Related items