Font Size: a A A

Sentence Embedding Representation With Syntactic Information Learning Method And Application Research

Posted on:2019-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:L TaoFull Text:PDF
GTID:2428330566959583Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Language understanding is a hot issue in the current academia and industry,and it is also one of the biggest problems in the field of artificial intelligence research.The key to language understanding is the semantic representation of natural language,which is the basis for natural language understanding and reasoning.Natural language has four levels: words,sentences,paragraphs,and chapters.Words are the most basic language units.Sentences are linguistic units that are organic,syntactic,linear,and have relatively complete semantics,which is also an important language unit that forms paragraphs and chapters.Different from the limited semantic space of word,the semantic of sentence is not the simple addition of words semantics.It is based on the semantics of words combined with syntactic effects.Thus,sentence semantics are much more complex and flexible than the semantics of words.With the successful application of distributed word embedding in many tasks,it is natural for people to think of whether the vector representation can be extended to a sentence or a long text,that is,the semantic representation of sentence is mapped to a low-dimensional continuous space.Because of the close relationship between semantic expression and syntax structure of sentence,the existing sentence embedding representation learning method can preserve the word order of sentences,but it can't avoid the loss of syntactic structure,so it is difficult to accurately learn the embedding representation of sentences.The current methods of sentence embedding representation learning lack syntactic information which leads to the accuracy problem.Thus,this paper proposes that merging syntactic structure information and word embedding for sentence embedding.The main work of the paper includes:(1)Put forward a sentence embedding representation learning method of merging syntactic information and word embedding(Syn Tree-WordVec),that is,merging syntactic information and word embedding after syntactic parsing of sentences,and learning to obtain sentence embedding.Compared with existing methods on different dimensions of word embedding in Chinese and English datasets over textual similarity task,the experimental results show that the proposed method performs better on the low dimension word embedding,and improves the precision and operation speed.On the Chinese datasets,the accuracy is improved by as much as 5.17%.(2)Research on the text similarity test of Technology Reward Projects Declaration based on the sentence embedding.That is,using the proposed sentence embedding learning method to obtain sentence embedding,and applying the sentence embedding to the text similarity test of Technology Reward Projects Declaration.The results show that the proposed method in this paper can test text similarity better.Thiswork can provide not only a scientific reference basis for the review of the technology reward project,but also a reference for other similar projects review,which has a good application value.
Keywords/Search Tags:sentence embedding, syntactic parsing, word embedding, language understanding, semantic representation
PDF Full Text Request
Related items