Font Size: a A A

Research Of Text Classification And Clustering Based On Hybrid Parallel Genetic Algorithm

Posted on:2008-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:W H DaiFull Text:PDF
GTID:2178360215456503Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text classification and clustering technologies are important research topics in the fields of natural language processing. They are present for information retrieval and information query. Facing the rapid inflating of each kind of text information, we can organize and tidy them by using text classification and clustering technologies. From this, we can accurate locate and distribute information. At the same time, the efficiency of query and retrieval is enhanced.Research of text classification and clustering has developed for more than 40 years. Along with the understanding and attention of these questions, the personnel who devote into this research increase gradually. In course of research, each kind of achievement emerges unceasingly. However, text classification and clustering both is complex question which is involved with multidisciplinary knowledge. A lot of questions wait for us to study thoroughly. Something such as feature choice and extraction, text feature expression, clustering method choice and realization as well as classification method choice and realization, all of them will have enormous influence to the result of text classification and clustering.The main research works and innovations in the paper are as follows:1. Considered various problems of text classification and clustering, we proposed a hybrid parallel genetic algorithm. Combined with the parallelity and global optimization ability of parallel genetic algorithm, as well as the efficiency and local optimization ability of K-means algorithm, we can provide a higher efficiency and precision for text classification and clustering by means of K-means clustering, heredity and mutation in the community, parallel evolution and intermarriage among communities.2. We applied hybrid parallel genetic algorithm to text classification problems. Used parallel genetic algorithm to extract feature words dynamically, we could reduce the feature dimension of text object effectively. Used hybrid parallel genetic algorithm to implement text clustering, we could get the number of clustering dynamically and acquire the high accuracy of text clustering.3. We applied hybrid parallel genetic algorithm to text classification problems. Used hybrid parallel genetic algorithm to latent semantic mining, we could eliminate the effect of synonym and near-synonym for text classification accuracy. Used hybrid parallel genetic algorithm to improve KNN text classification algorithm, and then used parallel genetic algorithm to optimize the parameters of SMO-SVM algorithm. In the end, used text classification algorithm based on KNN classification algorithm and SMO-SVM classification algorithm to classify the text set. From that, we could reduce the number of candidate classifications and improve the classification performance effectively.4. In order to confirm the efficiency and feasibility of our algorithm, we extracted a lot of texts from the Modern Chinese Corpus of State Language Commission. A great deal of contrast experiments proved this algorithm had good performance in text classification and clustering.
Keywords/Search Tags:Genetic algorithm, Text classification, Text clustering, K-Means clustering, KNN classification
PDF Full Text Request
Related items