Font Size: a A A

Sentence Similarity Calculation Based On Multi-sense Embeddings

Posted on:2022-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y C YangFull Text:PDF
GTID:2518306569951559Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet,a large number of text data are generated every day.How to quickly and effectively obtain the information that people are interested in and mining its potential value has become an urgent problem.The analysis and processing of these massive text data often involves sentence similarity calculation,and the polysemy and ambiguity of words have a great influence on this task.In recent years,with the help of word embedding and its powerful semantic expression ability,deep learning has achieved great success in the task of natural language processing.There are two ways to deal with polysemy in word vector representations: the multi-sense embeddings and dynamic representations.The dynamic representations are favored by researchers due to its good performance.However,due to the difficulty of fine-tuning in same domain specific model,and the performance requirements of specific application scenarios,such as recommendation or retrieval system,the multi-sense embedding may be a good choice for these applications.Therefore,this thesis proposes a sentence similarity computing framework based on multi-sense embedding representations.The framework includes word sense induction,iterative refinement of multi-sense embedding,and similarity calculation based on Bi-LSTM.This method can easily adapt to the situation of polysemy changes with the language development and evolution under some specific scenarios.The main work of this thesis includes:To deal with the polysemy changes with the language development and evolution under some scenarios,the thesis proposes a method of word sense induction by using word embedding and community detection in complex networks.The method uses context embeddings to represent the polysemous words,and uses the semantic information contained in the word embedding to construct a better complex network,and thereby improving the effectiveness and robustness of the community detection algorithm on that network.Experiment results show that this method is more effective than the classical Hyperlex algorithm.The multi-sense embedding derived from word sense induction may has the problem of inaccurate or incomplete semantic expression,the thesis thus proposes an iterative refinement training model for multisense embedding.The model uses a sense disambiguation module and a word embedding training module to refine the multi-sense representations by joint learning on the specific corpus.The method can easily adapt to polysemy changes scenarios;therefore improve the accuracy of multi-sense embedding expressions.Experiment results show that the proposed method can achieve good performance on the word similarity calculation task.A deep learning based sentence similarity calculation model can easily capture the sentence structure and context information to improve its performance.Therefore,the thesis designs a sentence similarity computing framework by integrating sense induction,multi-sense embedding refinement and Bi-LSTM matching computing technology.The framework can modify the multi-sense embeddings with the change of word use scenarios and the development of language,so as to get better similarity calculation accuracy.The experimental results show that the accuracy of sentence similarity has improved by 22.1% compared with the classical ESIM model,and 6.7% compared with class information based ESIM model,the performance has been improved significantly.
Keywords/Search Tags:Sentence similarity calculation, Word Sense induction, Multi-Sense Embeddings, Word Embeddings refinement
PDF Full Text Request
Related items