Font Size: a A A

Research Of Word Semantic Similarity Based On Domain Knowledge

Posted on:2015-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:X Y FengFull Text:PDF
GTID:2298330452959562Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Word semantic similarity is becoming a key problem for many applications, suchas artificial intelligence, information retrieval, text categorization, machine translation,semantic disambiguation, automatic questioning and answering system and syntacticanalysis. It has a high value of theoretical research and application prospects. Wordsemantic similarity calculation is the base and plays a vital role in sentence similaritycalculation, chapter similarity computing and other similarity calculation.Therefore, this paper focus on that word semantic similarity should associate withdomain knowledge, which traditional methods did not take into account. In order toadopt domain knowledge into semantic similarity measurement, this paper proposed asensitive words sets approach which used to select the suitable concept in specificdomain. Experimental results show that the similarity of the same word pair can bedifferent according to different domain. Further experiments demonstrate that ourproposed measurement approach significantly outperforms traditional similaritymeasurements.A new concept similarity calculation method was proposed. In this method, wejust think the first basic sememe and the other basic sememes as the same. This isbecause the way of description of some words is different and may lead to calculationerrors. Meanwhile, in this paper, when the relational sememes and relational symbolsememes are empty, a more realistic approach is that set the similarity as the similarityof basic sememes. Thus, the error is reduced in the process of concept calculation.We also propose a new approach for sememe similarity calculation based on aChinese knowledge base ‘HowNet’ in this paper. This method distinguishes threedifferent positional relationships between two sememes, and gives three differentcalculation formulas for them. The results of experiment have shown that our methodover performed than other methods.
Keywords/Search Tags:Information Processing, HowNet, Word Semantic Similarity, Sensitive Words Set, Domain Knowledge
PDF Full Text Request
Related items