Based On The Text Of The K-means Clustering Analysis

Posted on:2009-12-17

Degree:Master

Type:Thesis

Country:China

Candidate:H P Wang

Full Text:PDF

GTID:2208360248452311

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The traditional K-means clustering is the most widely used clustering algorithm. Its very wide applications, including text clustering, image and voice compression digging, the use of RBF network system modeling data preprocessing, and heterogeneous neural network structure of the task of decomposition. Morever,in terms of large data sets, K-means algorithm is a relatively scalable and efficient performance. As K-means algorithm clustering effect depends on the determination of numerical K, the initial cluster centre location and similar measure to the calculation of the selection, and K-means algorithm is a commonly used to local search algorithm, its main flaw is easy to fall in a local minimum, which often has great deviation with the entire local optimal solution.Because of K-means algorithm deficiency, the traditional K-means algorithm was improved from several aspects, first proposed by an indirect method of learning the weight values can be a good distance measurement, that is, how to learn the characteristics of the weight value in the weighted-distance, to improve K-means clustering algorithm performance. The algorithm is a structural evaluation function, a very small gradient of technology to reduce the evaluation function similar to the ambiguity of the matrix. Through the evaluation function of the weight to seek partial derivative successive adjustments to the value of each calculation of partial derivatives, the value of the weight to update the formula. If the evaluation function is lower or equal to a minimum threshold value or exceeds a certain number threshold value,one can finish learning. When a group of vector similarity is larger, we can learn characteristics of the weight to change their value in the same category ,which make the similarity larger. When a group of vector similarity is smaller, we can reduce the value of the weight to study characteristics of their similarity. So we learn it though very small study of the evaluation function of which is to feature vector of the right values, and improve the performance of the cluster. This was followed by K-value against the parameters of the learning algorithm, the initial use of a genetic algorithm optimum choice of the K-value. Although no way to find the K-value,we can still be adopted by the different value of the value of specific decisions. Finally, based on the traditional K-means algorithm implementation of the various links, the paper proposed several improved different types of K-means algorithm, and made some useful improvements to the K-means clustering algorithm.This paper use the improved algorithm, for some database clustering experiment, which showed that we achieve the desired results in the use of a method and enhance the clustering effect of the algorithm.

Keywords/Search Tags:

K-means clustering, Feature weight, Genetic Algorithm, Text mining

PDF Full Text Request

Related items

1	K-NN, K-means And The Application In Text Mining
2	K-means Text Clustering Algorithm Based On Double Genetic Algorithm In Text Mining
3	Some Issues Of Text Mining For Network Information
4	Research And Implementation Of Text Clustering Based On Fuzzy C-Means Clustering Algorithm
5	Text Clustering Based On K-means Algorithm And Realization
6	Optimized K-Means Clustering Analysis Based On Genetic Algorithm
7	Cluster Analysis Application And Research Of Text Mining
8	Study On Text Fuzzy Clustering Method Based On The Improved Feature Selection With TFIDF-GA
9	Study And Implementation Of Text Soft Clustering Based On Genetic Algorithms
10	Research Of Clustering Algorithm Based On Web Text Mining