Font Size: a A A

Lexical-semantic Similarity Calculation And Its Application In The Revision Of ISO 860

Posted on:2021-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y N DingFull Text:PDF
GTID:2518306113978519Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advancement of economic globalization,China's comprehensive national strength has been continuously strengthened,and its international status has also been continuously improved.Chinese is becoming increasingly important and occupying a pivotal position in the global political and economic fields.Due to differences in language,politics,economics,culture,and historical background in different regions,communication between experts in the international field has been difficult.Therefore,it is important to unify and harmonize the concepts and terminology of multiple language,including Chinese.ISO 860 is an international standard for harmonizing concepts,conceptual systems,and terminology.This standard is based on the languages of Western countries such as English and does not take into account the characteristics of Oriental languages such as Chinese and Japanese.Therefore,it is only applicable to countries in Europe and the United States with the same or similar languages.The expression form of Chinese is completely different from western languages.How to harmonize the multi-language terminologies including Chinese from the semantic level is a problem that needs to be solved urgently.The calculation of lexicalsemantic similarity is a basic and core work in the field of natural language processing.It has important significance in various fields such as artificial intelligence,information retrieval,and semantic disambiguation.The calculation of lexical-semantic similarity can provide powerful technical support for the realization of the harmonization of concepts,conceptual systems and terminologies.At present,there are mainly three calculation methods for lexical similarity,which are based on knowledge base rule algorithms,corpus "statistics" and "prediction" algorithms,and a combination of the two methods.This article summarizes and analyzes various methods.Among them,the knowledge base-based algorithm can accurately represent the conceptual information of vocabulary,and at the same time,it can provide strong support conditions for corpus-based algorithms to make up for the lack of conceptual information in the corpus.Knowledge base is manually written and the included vocabulary is incomplete,which is prone to a large number of unregistered vocabularies.At the same time,it makes it difficult for the computer to intelligently process natural language.A corpus-based method can solve the problems of a knowledge-based method very well.The computer processes tens of thousands of irregular texts,and can measure the similarity between lexical semantics without manual annotation of vocabulary.However,a corpus-based method calculates the semantic similarity from the semantic level and the vocabulary has no specific conceptual information.Which cannot distinguish between semantic similarity and semantic relevance,let alone semantic disambiguation.In view of the above problems,this paper proposes a lexical-semantic similarity algorithm that combines a corpus "prediction" model and knowledge base rules,which makes up for the shortcomings of a single algorithm.The algorithm uses the knowledge base and corpus to calculate the semantic similarity respectively,and then performs weighted summation.The optimal weight is determined by the test method to maximize the correlation coefficient.The experimental results are significantly better than other algorithms.However,algorithms based on corpus "prediction" models and knowledge base rules have so far failed to distinguish between semantic similarity and semantic relevance.Many very similar words do not have similarities,but rather have strong correlations.Therefore,this paper proposes a method of distinguishing lexical similarity and relevance based on HowNet.It uses semantic information,the degree of semantic similarity and relevance in HowNet to distinguish similarity and relevance between vocabularies.Experimental results show that the algorithm is real and effective,And the first k words extracted in the "predict" model vector space can better reflect the similarity between words.In this paper,the relevant theories of lexical-semantic similarity calculation are applied to the revision of ISO 860,in order to facilitate the communication between experts in multiple countries and provide technical support for the development of terminology.
Keywords/Search Tags:lexical-semantic similarity, corpus, HowNet, similarity, relevance, terms harmonization
PDF Full Text Request
Related items