Font Size: a A A

Study And Implementation Of Text Soft Clustering Based On Genetic Algorithms

Posted on:2007-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z J XuFull Text:PDF
GTID:2178360185974971Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of information technology, especially with the permeation and the application of Internet, electronic texts have become the major source of information provision. The task to organize documents in an effective form is required for the convenience of topic discovery, information retrieval and for the purpose of new documents'pre-categorizing preparations etc. To meet these requirements, document clustering techniques emerge in time. However, the study of text clustering was mainly based on the Hard Clustering technique in the past, i.e. one text given can only be partitioned into one class. Actually, with the information explosion and pervasion among various research fields, one text always points more topics due to the diversity and abundance of documents. Therefore, a method to describe the document classification conditions more objectively is in demand. Hence text soft clustering based on fuzzy clustering technique is becoming popular in Text Mining.Both feature selection and clustering algorithms are the most important factors in the study of text clustering, so main search of this paper lays on these two parts: The first concern is text unsupervised feature selection. Considering feature selection cannot be in good service with class labels missing, this paper brings forward a new approach defined as DFFS by integrating Document Frequency and Feature Similarity. This method could remove the irrespective features by computing their relativity, which is based on the filtration of 90 percent redundancy words. This method considers feature selection merely from the feature perspective. The selection won't be affected by clustering results. Thus, DFFS has overcome the clustering flaw which lacks of transcendent knowledge and solved the text soft clustering problem well.The second part is text soft clustering methods. This paper imported Genetic Algorithms which is global optimization after analysised the actuality of text soft clustering and researched Fuzzy C-Means, and presents a Sampling GA-based FCM approach(SGFCM for short). SGFCM is fit for solving the problems which is abundant and high dimensionality. So combining GA with FCM, SGFCM is generated with good search capability in conditions either global or local, and conquered problem that FCM is initialization sensitivity. Meanwhile, the new...
Keywords/Search Tags:Vector Space Model, Text Clustering, Feature Selection, Fuzzy C-Means, Genetic Algorithms
PDF Full Text Request
Related items