Font Size: a A A

Research Of Text Clustering Based On Semanteme And Domain Correlation

Posted on:2010-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:J KongFull Text:PDF
GTID:2178360278961026Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information industry has been developing rapidly in recent years, but there is littleresearch on text clustering about petroleum theme. Most of the existing clusteringtechnologies are about integrated texts and the researches about petroleum theme areimperfect. So it is great and significant to research and develop the petroleum theme textclustering forthespecialists.The traditional text clustering uses the vector space model based on keywords, it onlysimply considers the words'and characters'matching in grammar and does not take accountof semantic information, it also lacks of the understanding of those semantic informationcontainedinthe texts,soitaffectsthequalityoftextclustering.Because of the shortcoming of traditional clustering based on vector space model andimperfect domain information, the paper proposes a method of text clustering based onsemanteme and domain correlation. Firstly, with the advantages of semantic processing oftheme concept hierarchy, a new method of feature extraction based on theme is proposed.Then, on the basis of theme concept tree,the method of computing weights is put forward tosolve the problem of high-frequency and low-frequency words to some degree, and HASHtechnology is introduced to expand semanteme. Finally, based on the analysis of knowledgenet such as HowNet, an algorithm based on semanteme similarity preprocessing is proposedagainst the synonyms phenomenon, which can reduce the features dimension and make itpossibletoutilizethesemanteme-basedclusteringmininginsomespecialareas.The results of the experiments show that the clustering mining model based onsemanteme and domain correlation can solve the problem of lacking of the semantemeinformation. Comparing with the traditional clustering mining, the semanteme-basedclusteringsystemhashigheraccuracyandbetterquality.
Keywords/Search Tags:Text Clustering, Theme of Concept Hierarchy, HowNet, Semanteme Similarity
PDF Full Text Request
Related items