Font Size: a A A

Research On Word Similarity Computation Method Based On Non-IID Learning

Posted on:2020-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y T ZhangFull Text:PDF
GTID:2428330575487986Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Word similarity computation is to measure the degree of approximation of the meaning of the two words.It is a basic research in the field of natural language processing,and has an important impact on upper-level tasks,such as text classification,question answering,word sense disambiguation,and machine translation.Word representation learning is a basic and core work in the word similarity computation.Obtaining high-quality word representation is an effective way to improve the word similarity computation.This thesis tries to apply non-independent identically distributed?non-IID?learning theory to improve the quality of word representation learning.Firstly,the coupling relationship between concepts is fully exploited to generate high-quality concept vectors.Then,the words and concepts are associated through concept mapping.Finally,the concept vector is used to compute the word similarity,which future improves the performance of upper-level task,such as text classification,question answering,etc.The main work and contributions of this thesis inclouds the following three aspects:?1?In order to solve the problem that the traditional word similarity computation method confuses concept semantic information and ignores the concept coupling relationship,this thesis proposes a concepts representation method based on non-IID learning and a word similarity computation method based on non-IID learning.We fully explore various concept coupling relationships between concept description and knowledge network,including explicit concept co-occurrence coupling in concept description,explicit concept hyperlink coupling in knowledge network and implicit concept coupling between them.A concept representation method based on non-IID learning is proposed to capture the explicit and implicit coupling relations between concepts,and make full use of these coupling relations between concepts to obtain a more perfect concept semantic representation.Through concept mapping,words are associated with concept semantic representation,which improves the effectiveness of word similarity computation.On six real-word data sets,we compare the seven state-of-the-art word vector methods.The concepts representation method based on non-IID learning has surpassed the other significantly,whose average result is at least 20.4%higher than the baseline model.Experimental results show that our proposed method can effectively represent the semantic information of concepts and improve the performance of word similarity computation.?2?In order to further verify the performance of the word concept representation method based on non-IID,this thesis applies the concept representation to the text classification task,and proposes a text classification method based on non-IID word representation.Firstly,the text feature building module is designed by using the non-IID based concept representation method,which converts the text into a dense vector representation,providing a richer and more perfect semantic representation for the text.At the same time,the traditional word representation method is used to provide a common representation for the text.Then,the two vector representations are connected as feature representations of the text.Finally,the LIBLINEAR model is used to train the classifier to obtain the final text classification results.The experimental results show that the classification accuracy of six traditional text classification models combined with non-IID concepts representation has been effectively improved on the 20NewsGroup data set,with the F1 value increasing by 22.8%on average.?3?In order to further verify the performance of the word concept representation method based on non-IID,this thesis applies the concept representation to the medical question answering task,and proposes a medical question answering method based on non-IID word representation.Firstly,the word embedding module is designed using the concept representation method based on non-IID which transforms question-answer pairs into dense vector representations to provide richer and more perfect semantic representations for question sentences and answer sentences.Then,according to the characteristics of Chinese medical question answering tasks,six encoders are designed to encode the vector representation of question-answer pairs in order to capture the dependency relationship between words in sentences and generate the high-level semantic representation of question-answer pairs.Finally,Cosine similarity method is used to compute the distance between question and answer pairs in high-level semantic representation to obtain the similarity score between question-answer pairs.The answer with the highest score is taken as the result of the model.The experimental results show that the performance of the medical question answering system can be effectively improved after incorporating non-IID concept representation.The highest score of ACC@1 on the cMedQA dataset is 69.85%,which is better than the traditional method.
Keywords/Search Tags:Word Similarity, non-IID, Concept Representation, Text Classification, Question Answering
PDF Full Text Request
Related items