Font Size: a A A

Word Similarity Measurement Based On Word Embedding And WordNet

Posted on:2022-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:F Q ZhaoFull Text:PDF
GTID:2518306536979339Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer theory and technology,Natural Language Processing(NLP)has become more and more important in daily life and academic research,and measuring word similarity is an important aspect of it.The goal of word similarity measurement is to quantify the semantic similarity of a pair of words or concepts.Word similarity can be applied to many important fields,such as machine translation,retrieval systems,and can also play a role in intelligent question answering systems.This thesis focuses on English word similarity measurement.It can be found from the existing research that the word similarity measurement model based on knowledge(such as WordNet as a commonly used knowledge base or ontology)relies on the manually labeled knowledge base,which is usually simple but limited in scale and poor in accuracy;while corpus-based model for word similarity measurement uses complex algorithms such as neural networks to embed words in a huge corpus,which has strong representation ability but is difficult to distinguish complex semantics.Therefore,a type of model that combines the two,that is,adding knowledge as external semantics to the corpus-based model for word similarity measurement,has become a new research direction.How to more effectively combine WordNet and word embedding model to calculate word similarity is the focus of this thesis.Two novel models for word similarity measurement that can effectively combine WordNet and word embedding are successively proposed.The main research contributions of this thesis include the following:(1)Most of the existing models that combine WordNet and word embedding only involve a single aspect of external semantics.This thesis proposes a novel model for word similarity measurement,DFRVec model,that encodes various semantic information in WordNet into a vector space through pre-trained word embedding.The main idea is:DFRVec model encodes definition,POS(part of speech),word form and semantic relations in WordNet into the vector space through three new sub-models,while retaining the semantic information embedded in the original word embedding.Then the three submodels are linearly combined with the original word embeddings.The calculation result of DFRVec model is that each word is represented by a set of vectors,which is used to improve the calculation effect of word similarity.(2)Considering that the existing models hardly consider the semantic relations between a pair of words when measuring its word similarity,which may result in missing part of the semantic information,this thesis proposes a model,called DR,that tunes the word vector representations obtained from DFRVec model based on the semantic relations between the pair of words.DR model combines DFRVec model and a new Rel Sim submodel.Rel Sim sub-model divides the possible semantic relations in WordNet into four types: Synonymy and Similar,Hypernymy and Hyponymy,other relations,Antonymy.And then it tunes the vectors of the pair of words with different weighting factors according to their different semantic relations.The main idea of using DR model to measure word similarity is that DFRVec model generates vector representations of a word pair,and then Rel Sim sub-model tunes the two sets of vector representations of the word pairs,which is used to further improve the calculation effect of word similarity.(3)This thesis respectively uses 4 existing different word embedding models to conduct experiments on 10 public benchmark datasets for the DFRVec model and DR model proposed in the research for word similarity measurement,and compares them with 13 existing models,including WordNet-based models,word embedding-based models and WordNet-based word embedding models.The experiments prove that the models proposed in this thesis have a better effect on word similarity measurement compared with the existing models.
Keywords/Search Tags:Word similarity, WordNet, Semantic information, Word embedding
PDF Full Text Request
Related items