Font Size: a A A

Study On Concept Semantic Similarity Measure Based On Ontology

Posted on:2017-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:F LiFull Text:PDF
GTID:2308330488475453Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The measurement of the semantic similarity between concepts is an important research area in natural language processing. It can be widely applied in the fields of information retrieval, machine translation, word sense disambiguation, automatic question and answering and so on, and is a fundamental research topic. At present, the methodology of concept semantic similarity measures can be divided into two groups: One group is to measure semantic similarity by using world knowledge, which is mainly based on a conceptual semantic similarity, and uses those relations (hypernymy/hyponymy, apposition, holonymy/meronymy) between concepts to measure concept semantic similarity. The other group is based on a large-scale corpus statistics and utilizes the information of vocabulary probability distributions to measure semantic similarity between concepts. These methods can be adapted to a particular application that has an approximate area with the corpus. With the development of the ontology structure and the increasment of the ontology vocabulary, more and more researchers to exploit the ontology for measuring semantic similarity, but there are also some limitations, the practicality of which has been questioned with efficiency and different application areas. To this situation, this paper begins with that direction to explore research.On the basis of our previous research work and relevant literature, this paper uses domestic "CiLin" and foreign WordNet, based on those residual problems of their research, similarity computing models is proposed respectively. The main works in this paper as follow:First, in Chinese word semantic similarity measurement, the shortcomings of a representative algorithm proposed by Jiule Tian’s have been analyzed, then I propose a fairly good solution for problems, enabling the pearson correlation coefficient between the human judgments inMC30 dataset and the computational measures improve increase to 0.85 from 0.53, which has a good practical value.Second, through studies with lots of excellent algorithms and understand of CiLin, on the basis of Dekang Lin similarity theory, with theory analysis and derivation, and a new concept semantic similarity measurement method is finally proposed.Third, in judge the effect of Chinese word semantic similarity measurement method, on the basis of some foreign judgement standard, is forming a Chinese word semantic similarity measurement method due to the shortage of concrete standards of evaluating at home, which can provide a judgement standard for Chinese word semantic similarity measurement method.Fourth, in English concept semantic similarity measurement, based on the study of WordNet, considering the problem of the calculate results has not so well due to the exist of the irregular densities of links between concepts. I first extract a density-based weight algorithm from existing algorithms by some improvement ideas, to prove those problems caused by the irregular densities can be improved by density compensate path. Then I propose a measurement model of area density-based compensate path, and this model is applied to some popular path-based algorithms, exploit the national standard dataset, and we find the Pearson correlation coefficient between the human judgments and the computational measures with this paper model has markedly improved reach international level.Fifth, considering the advent of the big-data era, the number of concept might change at any time, the information content-based which is current most effective might not adapt to this trend, in this paper, the proposed new model pointed out the direction for the problem that exist in the information content-based methods with the big-data era.
Keywords/Search Tags:Cilin, WordNet, semantic similarity, path computing model, natural language processing
PDF Full Text Request
Related items