Font Size: a A A

Research On The Scale Free Graph K-medoids Cluster Algorithm

Posted on:2010-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y T ShenFull Text:PDF
GTID:2178360275956565Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Text clustering plays an imPortant role in text mining and knowledge discovery systems.The theory and method can manage and organize text;It can improve the result of queries;provide intuitive navigation and browsing mechanisms;and find similar texts.So text clustering have been an important research direction and research topics.At present,most of text clustering algorithm represent text or document using Vector Space Model.This representation is very simple,but raises one severe problem: the high dimensionality of the features pace and the inherent data sparsely.In addition, this representation also can't solve text data's polysemy problem and synonym problem. Traditional text clustering algorithms for clustering arbitrary shapes appear "helpless", when sample space is not convex,the algorithm performance on a "local" optimization, the existence of "outlier" data impacts on the effectiveness of traditional text clustering. All these problems interfere with classification or clustering learning processes greatly and make their performances be dramatically dropped.The main technologies to solve the problem are dimensionality reduction,to add word frequency and part of speech such as semantic information when representing text features,but these methods have their own defects,so it improves the quality of clustering a little.To solve the problems mentioned,(1)The scale free graph k-medoids cluster algorithm not only solve the problems of the high dimensionality of the features pace and the inherent data sparsely,the scalability,but also change the situation of algorithm performancing on a "local" optimization,when sample space is not convex.(2) This text proposed a new method for text clustering based on semantic similarity by using hownet,represents text or document using concept set model,sememe expanding of concept,calculate set similarity between sememe set,the method solves semantic problem to some extent:polysemy problem and synonym problem,(3)At last,the results showed that the scale free graph k-medoids cluster algorithm produced better effect than traditional text cluster algorithm.
Keywords/Search Tags:text clustering, semantic similarity, concept set, clustering algorithm, hownet, set similarity
PDF Full Text Request
Related items