Font Size: a A A

A Sentence Representation Method Based On Syntax And Semantic

Posted on:2020-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LeFull Text:PDF
GTID:2428330620451114Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Sentence similarity modeling lies at the core of many natural language processing applications,and thus has received much attention.Measuring sentence similarity is challenging due to the ambiguity and variability of linguistic expression,and thus has received much attention in recent years.A large number of prior works focused on feature engineering,and several types of sparse features have been shown to be useful.Recently,owing to the success of word embeddings,researchers have attempted to study sentence similarity modeling via sentence embeddings.Most of them focused on learning semantic information and modeling it as a continuous vector,yet the syntactic information of sentences has not been fully exploited.On the other hand,prior works have shown the benefits of structured trees that include syntactic information,while few methods in this branch utilized the advantages of word embeddings and another powerful technique —attention weight mechanism.Inspired by the above observations,this thesis attempts to absorb the advantages of the above mentioned techniques,and develop a more efficient method.In a nutshell,this thesis proposes the ACV-tree model,which uses a structured manner for sentence similarity modeling.It seamlessly integrates semantic information,syntactic information,and the attention weight mechanism.To measure similarity,this thesis develops a new tree kernel,known as the ACVT kernel,that is tailored for this thesis proposed structure and is designed for high operability.ACV-tree model can be used as a general framework,since one can view word embedding and attention weight as the building blocks of the framework,allowing users to replace them using other off-the-shelf(or more powerful,developed in the future)word embedding techniques and attention weight schemes.Besides,unlike most of sentence embedding-based models,ACV-tree model can be free from time-consuming learning/training,once word embeddings are available.On the other hand,there are also word embedding based models for sentence similarity modeling.Nevertheless,ACV-tree model can achieve better performance on almost all datasets used in our experiments,compared against the word-embedding based models.In order to verify the effect of the proposed model,this thesis conducts experiments on19 datasets,which are derived from the Semantic Textual Similarity(STS)task of the International Semantic Evaluation SemEeval Competition.Each dataset contains many pairs of sentences.These datasets cover a wide range of domains such as news,web forum,images,twitter.The experimental results,based on 19 widely-used STS datasets,demonstrate that our model is effective and competitive,compared against state-of-the-art models.Additionally,this thesis studies the universality of ACV-tree model by using various attention weight mechanisms and word embedding techniques.Specifically,this thesis has explored many variant methods based on word embedding,attention weighting mechanism,and syntax.The experimental results validate that many attention weight mechanisms and word embedding techniques can be seamlessly integrated into ACV-tree model,demonstrating the robustness and universality of our model.
Keywords/Search Tags:Natural language processing, Sentence similarity, Tree kernel, Sentence embedding, Attention weight mechanism
PDF Full Text Request
Related items