Font Size: a A A

Study Of Text Clustering Algorithm Based On Semantics

Posted on:2013-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z X GuoFull Text:PDF
GTID:2248330395955348Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet today, people are in an "informationexplosion" era. Currently there are vast amounts of semi-structured or unstructuredinformation, how fast and efficient mining of useful information for people, is aproblem which lots of scholars are working on it. Text document clustering is a methodof automatic classification, which does not require training. Currently most clusteringalgorithms do not have a high speed and accuracy.Firstly, for the above problem, we propose a graph-based structure of the textrepresentation model-WSCG (Weighted Subject Conceptual Graph), which divides thedocument concepts into centroid concepts and peripheral concepts bases on theirsemantic relations to the subject, and the semantic similarity between two documents iscalculated by centroid concepts and peripheral concepts respectively. Secondly, basedon the existing study of the clustering algorithm, to make the relation calculationbetween two documents more accurate during the clustering process, we design a textclustering algorithm based on WCSG. Finally, based on the study, a text clusteringsystem–SemCluster, is implemented in C++.Experiments show that the representation based WCSG text in the document textsimilarity calculations and clustering has higher accuracy than existing methods, whilethe text clustering system has been tested, proved the system met the designrequirements.
Keywords/Search Tags:Text Clustering, Semantic Similarity, WSCG, Fuzzy Clustering
PDF Full Text Request
Related items