Font Size: a A A

Research On Text Categorization Based On Genetic Algorithm And Fuzzy Clustering

Posted on:2010-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:S Y YuFull Text:PDF
GTID:2178360278966865Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the data's explosive growing, information processing has become a indispensable tool for people to acquire useful message, so that text categorization is the important research direction. Fuzzy clustering analysis, as a kind of unsupervised learning methods, is a research hotspot concerning about text categorization. Therefore the research of text categorization based on fuzzy clustering is hence of great theoretical and practical significance. However, fuzzy clustering algorithm exist initial value sensitivity problem. Therefore, In this paper, a fuzzy clustering algorithm based on genetic algorithm is proposed.This paper test and comparison of fuzzy C-means clustering(FCM) and weighted FCM(WFCM) clustering algorithm, which is a improvement of FCM. the results show that WFCM clustering algorithm improved the fuzzy clustering's accuracy rate. Genetic algorithms are a high efficient global optimization stochastic search algorithm, this paper combines genetic algorithm with WFCM, the characteristics of weighted FCM clustering algorithms based on genetic algorithms (GWFCM) is put forward, which making full use of FCM local search virtue and global search ability of genetic algorithm. In this paper, at the basis of study clustering class number automatically learning, improve the effective judgment of clustering, dynamic changes clustering class number in algorithm, the validity and precision of clustering is advanced.Aiming at coding characteristics problems, in this article a concept of degree of genetic variation is introduced. In the algorithm implementation process, crossover and mutation operator, the dynamically calculated value of genetic variation, the value to limit the bad fitness individual production, So as to the optimize execution performance of genetic algorithm. This clustering method is greater improvement than classical clustering algorithms in performance. Through non-linear mapping, it can better distinguish extract and amplify useful features.Due to using proportional selection operator in the application of genetic algorithm, there are some questions, which are premature convergence in early evolution and search efficiency decline late evolution. For these reason, in this paper a kind of non-linear selection mechanism is proposed. In the group improvement process, this paper propose elite gene introduce policy, So as to ensure the stability of genetic evolution, on cluster centers improve search efficiency.In order to confirm the efficiency and feasibility of our algorithm, We compare GWFCM with FCM and WFCM. extract a lot of texts experimentize, The experiment results shown that the precision,recall and F1 improved 0.030,0.022and 0.026 separately. GWFCM has better performance than other methods in text categorization and clustering.
Keywords/Search Tags:text categorization, clustering analysis, genetic algorithm, fuzzy C-means clustering
PDF Full Text Request
Related items