Font Size: a A A

Algorithm Study On Based-Feature Word Similarity In Ontology

Posted on:2018-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:X H GuoFull Text:PDF
GTID:2348330518456592Subject:Computer Science and Technology
Abstract/Summary:
Word similarity calculation is not only an important basic research topic in natural language processing,it is also widely used in knowledge management,information retrieval,biomedical,cognitive science and psychology and other fields.With the advent of the information age,people are more urgent need to solve the problem of word similarity computation.In view of the rich semantic relations and the structure that is convenient to compute in the ontology,more and more research scholars begin to study word similarity based on knowledge ontology.Therefore,this paper proposes a mapping model of concept features and taxonomy parameters for knowledge ontology and a basic feature-based algorithm formula of word similarity computation based on previous feature-based algorithm model of similarity computation.Based on mapping model and proposed basic feature-based algorithm model of similarity computation,several feature-based similarity algorithm models are proposed to solve some unresolved problems and improve computational accuracy of the similarity between words in the previous algorithm for HowNet and WordNet respectively.For HowNet,this paper presents a feature-based algorithm to quickly compute the similarity between words.In order to make HowNet directly compute the similarity between words by using IS-A relations,eliminating the process that computing the similarity between sememes before computing similarity between senses,thus simplifing the process of computing the similarity between words,a sense tree is constructed based on previous sememe tree by exploiting relations between each sememe in DEF of HowNet.First of all,the first independent sememe with constraint relationship in DEF of HowNet is defined as abstract concepts,which make DEF into a multi-level abstract concept group.Then,senses are hung in existed sememe tree according to abstract concepts in the definition of senses,so a sense tree that contains these concepts including sememes,abstract concepts and senses is contrusted.Then features between concepts is mapped into the depth and the path between concepts based on mapping model,and taking the proposed basic feature-based formula of word similarity computation as basic formula of algorithm model in this chapter.On the basis of this,the formula is improved,and features between concepts are compensated by sememe in DEF for two concepts.And the contribution size of each sememe in DEF to features is adjusted by parameters.In addition,this paper also considers that the contribution of depth and path to the similarity between words is not the same.By mapping features between concepts into the depth and the path between concepts based on mapping model and taking the proposed basic formula that is transformed into the fornmula with parameters,this paper also proposes a similarity computation model based on weighted features.The experimental results show that the Pearson correlation coefficients of 0.85 and 0.86 are obtained by comparing computed similarity values on MC30 word pairs with human judgment values,which achieves current excellent word similarity algorithm level.In addition,this paper also tests words pairs that appeared in previous related papers.The experimental results show that two algorithm models are better than previous ones.For WordNet,this paper proposes a feature-based fusion multi-source information model to compute word similarity.In this paper,the lowest common subsumer between concepts is considered to common features,the shortest path between concepts is considered to different features by mapping features into the taxonomy parameters including the path and the depth and the information content between concepts,and based on proposed basic formula that is improved in this chapter,and using density compensation to increase the feature difference between concepts and improving the non-linearity higher phenomenon of the similarity in previous algorithm.The introduction of the encoding difference overcomes the shortcomings of non-unique similarity that caused by single information source and fine tunes feature differences between concepts.Finally,the algorithm model takes into account the contribution degree of path,depth and independent encoding to compute the similarity between words by introducing edge weights and adjustment parameters.The experiment results show that using RG65,MC30,666 pairs of noun and 222 pairs of verbs in SimLex999,YP130 can achieve 0.88,0.89,0.61,0,52,0.80 and best to 0.88,0.89,0.61,0.55,0.81 of pearson correlation coefficient by comparing computed similarity value with human adjustment value by using same and independent parmeters respectively.The results all reached current excellent word similarity algorithm level.In addition,the algorithm is applied to the SNOMED CT medical ontology,which is very similar to the WordNet in structure.Experiments show that the algorithm model applied to SNOMED CT can best to obtain 0.86 of Pearson correlation coefficient comparing computed similarity value with human adjustment value on international medical dataset Pedersen30,which also reached current excellent word similarity algorithm level.
Keywords/Search Tags:HowNet, WordNet, SNOMED CT, feature, word similarity
Related items