Font Size: a A A

Study On Text Fuzzy Clustering Method Based On The Improved Feature Selection With TFIDF-GA

Posted on:2015-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:G C DengFull Text:PDF
GTID:2298330422978055Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text clustering is a text categorization technology which does not need textcategory labels, aiming at maximizing text similarity within the same class andminimizing text similarity between different classes. Nowadays, with the explosivegrowth of information and cross-penetration of different subjects, texts areincreasingly massive and various, resulting that the class membership boundary of atext is less explicit. So text fuzzy clustering is becoming one of another importantresearch area.In this paper, feature selection method and fuzzy C-means algorithm are mainlystudied:First, text unsupervised feature selection method. Feature selection methodsmainly have filter method, wrapper method and embedded method these threecategories. In this paper, a new unsupervised feature selection method, combing afilter method-TFIDF and a wrapper method-GA, is proposed. TFIDF-GA methodcalculates feature-weight with slightly improved TFIDF algorithm, and then selectsan initial subset of features according to specific selection rules. Genetic algorithmgenerates initial population on the base of this feature subset and does searchoperations iteratively. The initial subset of features to be the initial population makesgenetic algorithm having a good starting point for the search, speeding up the searchspeed. Meanwhile, the adaptive global search capability of genetic algorithm is ableto heuristically search those features that have strong class distinguishing ability butnot in the initial feature subsets.Second, fuzzy C-means clustering algorithm. Within fuzzy clustering algorithmbased on target, FCM algorithm is the most widely used. This paper presents animproved FCM algorithm. It initializes clustering centers with a defined densityfunction, which decreases the error to some extent by the randomness of selectingcenters. And the algorithm introduces the information entropy to be anotherconstraint condition of FCM, which describes the real data distribution moreproperly. In this paper, starting with the feature selection method and FCM algorithm,proposes a fuzzy clustering algorithm based on TFIDF-GA and improved FCMalgorithm. Finally the experiment results show that this algorithm can get highquality clustering results.
Keywords/Search Tags:text fuzzy clustering, feature selection, genetic algorithm, FuzzyC-Means algorithm
PDF Full Text Request
Related items