Sentence Similarity Calculation Based On Multi-sense Embeddings

Posted on:2022-09-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Yang

Full Text:PDF

GTID:2518306569951559

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of mobile Internet,a large number of text data are generated every day.How to quickly and effectively obtain the information that people are interested in and mining its potential value has become an urgent problem.The analysis and processing of these massive text data often involves sentence similarity calculation,and the polysemy and ambiguity of words have a great influence on this task.In recent years,with the help of word embedding and its powerful semantic expression ability,deep learning has achieved great success in the task of natural language processing.There are two ways to deal with polysemy in word vector representations: the multi-sense embeddings and dynamic representations.The dynamic representations are favored by researchers due to its good performance.However,due to the difficulty of fine-tuning in same domain specific model,and the performance requirements of specific application scenarios,such as recommendation or retrieval system,the multi-sense embedding may be a good choice for these applications.Therefore,this thesis proposes a sentence similarity computing framework based on multi-sense embedding representations.The framework includes word sense induction,iterative refinement of multi-sense embedding,and similarity calculation based on Bi-LSTM.This method can easily adapt to the situation of polysemy changes with the language development and evolution under some specific scenarios.The main work of this thesis includes:To deal with the polysemy changes with the language development and evolution under some scenarios,the thesis proposes a method of word sense induction by using word embedding and community detection in complex networks.The method uses context embeddings to represent the polysemous words,and uses the semantic information contained in the word embedding to construct a better complex network,and thereby improving the effectiveness and robustness of the community detection algorithm on that network.Experiment results show that this method is more effective than the classical Hyperlex algorithm.The multi-sense embedding derived from word sense induction may has the problem of inaccurate or incomplete semantic expression,the thesis thus proposes an iterative refinement training model for multisense embedding.The model uses a sense disambiguation module and a word embedding training module to refine the multi-sense representations by joint learning on the specific corpus.The method can easily adapt to polysemy changes scenarios;therefore improve the accuracy of multi-sense embedding expressions.Experiment results show that the proposed method can achieve good performance on the word similarity calculation task.A deep learning based sentence similarity calculation model can easily capture the sentence structure and context information to improve its performance.Therefore,the thesis designs a sentence similarity computing framework by integrating sense induction,multi-sense embedding refinement and Bi-LSTM matching computing technology.The framework can modify the multi-sense embeddings with the change of word use scenarios and the development of language,so as to get better similarity calculation accuracy.The experimental results show that the accuracy of sentence similarity has improved by 22.1% compared with the classical ESIM model,and 6.7% compared with class information based ESIM model,the performance has been improved significantly.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Chinese Word Sense Disambiguation Method Based On Deep Learning
2	Study On Multi-sense Word Vector And Semantic Similarity
3	Research On Word Sense Disambiguation And Keyword Expansion In Question Answering System
4	The Research On Chinese Word Sense Induction
5	Sentence-Level Language Analysis With Contextualized Word Embeddings
6	Word Embeddings Towards Text Classification Of Emotion And Topic
7	Research Of Word Sense Disambiguation Based On Word-sense Category Extending
8	Research Of Sentiment Classification Based On Attention Word Embeddings
9	Jointly Learning Chinese Word Embeddings With Heterogeneous Morphemes
10	Improving Word Embeddings And Applying Them In Literature Style Recognition