Font Size: a A A

Based On The Text Of The K-means Clustering Analysis

Posted on:2009-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:H P WangFull Text:PDF
GTID:2208360248452311Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The traditional K-means clustering is the most widely used clustering algorithm. Its very wide applications, including text clustering, image and voice compression digging, the use of RBF network system modeling data preprocessing, and heterogeneous neural network structure of the task of decomposition. Morever,in terms of large data sets, K-means algorithm is a relatively scalable and efficient performance. As K-means algorithm clustering effect depends on the determination of numerical K, the initial cluster centre location and similar measure to the calculation of the selection, and K-means algorithm is a commonly used to local search algorithm, its main flaw is easy to fall in a local minimum, which often has great deviation with the entire local optimal solution.Because of K-means algorithm deficiency, the traditional K-means algorithm was improved from several aspects, first proposed by an indirect method of learning the weight values can be a good distance measurement, that is, how to learn the characteristics of the weight value in the weighted-distance, to improve K-means clustering algorithm performance. The algorithm is a structural evaluation function, a very small gradient of technology to reduce the evaluation function similar to the ambiguity of the matrix. Through the evaluation function of the weight to seek partial derivative successive adjustments to the value of each calculation of partial derivatives, the value of the weight to update the formula. If the evaluation function is lower or equal to a minimum threshold value or exceeds a certain number threshold value,one can finish learning. When a group of vector similarity is larger, we can learn characteristics of the weight to change their value in the same category ,which make the similarity larger. When a group of vector similarity is smaller, we can reduce the value of the weight to study characteristics of their similarity. So we learn it though very small study of the evaluation function of which is to feature vector of the right values, and improve the performance of the cluster. This was followed by K-value against the parameters of the learning algorithm, the initial use of a genetic algorithm optimum choice of the K-value. Although no way to find the K-value,we can still be adopted by the different value of the value of specific decisions. Finally, based on the traditional K-means algorithm implementation of the various links, the paper proposed several improved different types of K-means algorithm, and made some useful improvements to the K-means clustering algorithm.This paper use the improved algorithm, for some database clustering experiment, which showed that we achieve the desired results in the use of a method and enhance the clustering effect of the algorithm.
Keywords/Search Tags:K-means clustering, Feature weight, Genetic Algorithm, Text mining
PDF Full Text Request
Related items