Font Size: a A A

Computation Of Word Similarity And Its Application In Question Answering System

Posted on:2018-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2348330515469717Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the arrival of big data age,every-day the Internet produces a large number of text information.Since words are the basic units of the text,understanding the semantics of words is one basic task of text processing.The word similarity computation is a method to describe the similarity of two words with a specific value.It is the main method to understand the semantics of words.The solution of similarity problem will lead to the development of various fields,such as question answering system,information retrieval,word sense disambiguation and machine translation.Based on the study of the related research methods of the word similarity calculation and the question answering system,this thesis presents a method to calculate the word similarity based on the sememe vector,and studies the application of the method in the Knowledge Base Question Answering(KBQA).The main contents of this dissertation are as follows:(1)Proposing a sememe vector generation model SIC_PageRank.In the hierarchical structure of the sememe hyponymy,the Sememe Information Content(SIC)is calculated from the depth information of the sememe and its descendants.Using SIC and the connection in the sememe structure,PageRank transfer probability matrix is built.And based on PageRank algorithm,the vector representation of each sememe is generated respectively.(2)Proposing a method to calculate the word similarity based on the sememe vector.In this method,the sememe vector is generated based on SIC_PageRank,the similarity of the sememe is calculated from the similarity of the sememe vector,the concept similarity is calculated from the similarity of the sememe,and finally the similarity of the word is calculated.Applying the method to the word sense classification of nouns in Contemporary Chinese Semantic Dictionary(CSD),the results show that,compared with the manual proofreading,the coincidence rate of the method is 71.9%,which is better than using the shortest path distance method.(3)Discussing the application of word similarity calculation in the KBQA.In the KBQA,the similarity between the question predicates and the candidate answer predicates is computed based on the similarity of the words,and the candidate answers are sorted based on their characteristics,such as the edit distance,collocation and classification,using Ranking SVM model.In this thesis,the experiment is carried out with the KBQA evaluation task data set of the NLP & CC2016.The experimental results show that,with the word similarity calculation method based on the sememe vector applied in the KBQA system,the precision rate of the recognition is 73.88%,the recall rate is 82.29%,and the average F1 value is 75.88%.All the three evaluation indicators are higher than those of the word2 vec method.
Keywords/Search Tags:HowNet, Sememe vector, PageRank, Word similarity, QA System
PDF Full Text Request
Related items