Font Size: a A A

Research And Improvement Of K - Means Clustering Algorithm

Posted on:2016-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:L L LiuFull Text:PDF
GTID:2208330464463537Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, life, production and research in various fields today are being digitized for processing information. The extremely large number of text, images, audio, video and other forms of data generated. How to extract the unknown information with hidden potential value from the mass data accurately and efficiently, It is an important issue.The birth of data mining technology has brought many effective methods and tools to solve this problem. As a new interdisciplinary science technology, it contains several popular research directions. Cluster analysis(referred to as "clusters") is one of the most mature and most widely used data mining techniques. Its main function is dividing the data set into a number of different groups based on certain rules, data objects in the same group are as similar as possible,on the other hand, data objects in different groups are as different as possible. Calculating the similarity between data objects is by describing the object’s properties to achieve. At present,clustering analysis has been widely applied in machine learning, pattern recognition, image processing, text classification, marketing, statistical science and lots of others fields.According to the difference of research status and structure of thinking, we can divide existing clustering algorithms into partition algorithm, hierarchical algorithm, grid-based algorithm, density-based algorithm and model-based algorithm. K- means clustering algorithm is an classical algorithm based on partition. This thesis presents deeply research and analysis on merits and defects of k-means clustering algorithm. According to the feature that the results of k-means clustering algorithm liable to be effected by initial centers, this thesis has provided a improvement on k-means clustering algorithm. Following are the main works have been done:(1)Describing the data mining research status, cluster analysis research background and related concepts.(2)Studying the basic ideas and principles of K- means clustering algorithm, presenting deeply research and analysis on merits and defects of K-means clustering algorithm, analyzing and comparing improvements to existing measures K- means clustering algorithm. To get the best number of clusters, an optimization algorithm of K values is proposed. Experimental results show that the algorithm solves the dependency problem of K value successfully.(3)Aiming to the disadvantages of K-means clustering algorithm that it is sensitive to the initial centers selection and easily falls into local optimal solution, differential evolution algorithm whose global optimization ability is strong was introduced into clustering algorithm with crossover and mutation, selection operation to replace the cluster centers continuously updated process. This thesis put forward an improved differential evolution algorithm and madeit combined with K-means clustering algorithm at the same time. Finally, experiments verify the effectiveness and feasibility of improved algorithm.
Keywords/Search Tags:Data Mining, Cluster Analysis, K-means Clustering Algorithm, Optimal Clustering Number, Differential Evolution Algorithm
PDF Full Text Request
Related items