Research Of Text Clustering Based On Genetic Algorithm

Posted on:2010-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:L Yang

Full Text:PDF

GTID:2178330338976288

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

Text Clustering, one of the most important research braches of clustering, is the application of clustering algorithm in Text Processing. Facing the massive volume and high dimensional text data, how to build effective algorithm for text clustering is one of research directions of data mining.Text data are unique, that is unstructured text form, making the text with the character of high-dimensional and sparse nature. Synonyms and polysemy problems are unique phenomena to natural language text data. These problems make the text clustering with high time complexity, and interfere with the accuracy of the clustering algorithm, making sharp decline in the performance of text clustering.First, in this paper, the combination of latent semantic indexing and genetic algorithm is for the purpose of eliminating these problems. In Latent Semantic Indexing, Singular Value Decomposition makes the original feature space transform into a corresponding smaller latent semantic space, so that you can eliminate the diversity of usage of words and randomness of expressions. Genetic algorithm optimization feature selection can be in the absence of a priori knowledge of the circumstances of the feature vectors to achieve the purpose of further dimension reduction, thereby reducing the clustering complexity.Second, in the study of clustering algorithm, this paper presents a variable-length chromosome genetic algorithm based on the K-center clustering algorithm. As the K-means algorithm on outlier-sensitive, this paper adopts the basic K-center clustering algorithm. K-center algorithm also requires pre-determined K values, while the value of clustering results is highly dependent on K value. Using variable-length encoding chromosome genetic algorithm clustering, clustering algorithm is not limited to the initial population of good and bad.Last, the simulation results show that the genetic algorithm to optimize dimension reduction is advantageous, and, comparing the experimental analysis shows the improved the effectiveness of the algorithm proposed in this paper, drawing the conclusion that the improved algorithm is superior to other algorithms.

Keywords/Search Tags:

text clustering, feature selection, latent semantic indexing, GA, K-center algorithm, improved K-center algorithm

PDF Full Text Request

Related items

1	Research On Document Clustering Technology Based On Latent Semantic Indexing
2	Knn Text Classification Algorithm Based On The Semantics Of The Center
3	Research On Text Clustering Algorithm Based On Latent Semantic Indexing
4	Research On Text Classification Based On Ontology And Latent Semantic Indexing Algorithm
5	Web Text Mining Based On Latent Semantic Indexing
6	Precise Clustering Algorithm For Chinese Text Based On K-means
7	Text clustering using latent semantic indexing
8	The Research Of Optimization Technology In Latent Semantic Indexing Based On Pseudo Text
9	Research On Problems Related To The Initial Center Selection In K-means Clustering Algorithm
10	A Latent Semantic Indexing Differences Model And Its Application