Font Size: a A A

Context-Aware Natural Language Semantic Representation Research

Posted on:2020-04-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:K ZhangFull Text:PDF
GTID:1368330578981842Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Natural language is the main approach that humans communicate with others,con-taining highly abstract semantic information.Making the machine to understand natural language is one of the key researches in AI-complete research.To achieve this goal,the first challenge we meet is how to represent the semantic of natural language.As the crys-tallization of human knowledge and wisdom for thousands of years,natural language has a complex organizational structure and various forms of semantic expression.The words,phrases,sentence,paragraph,and documents have their unique characteristics,and have the potential connection at the same time.Currently,word embedding based on the distributed semantic hypothesis and pre-trained models based on the large-scale corpus and large-scale network structure have achieved impressive performance.How-ever,semantic representations of natural language are still facing many challenges,such as inadequate utilization of contexts,complex model structure and difficult interpreta-tion of internal mechanisms.Therefore,this dissertation proposes to use context for assisting the understanding and representing of natural language semantic from the uti-lization of contexts aspect and the processing methods of texts aspect.Ultimately,it en-hances the ability of models to generate the semantic representation of natural language.To be specific,the contribution of this dissertation can be summarized as follows:Firstly,we propose Context-Enriched Neural Network(CENN)and Context-Aware Dual-A ttention Network(CADAN).Traditional word embedding methods leverage one fixed vector to represent the natural language semantic,which will suffer from the am-biguity and inaccuracy of word meanings.Therefore,we focus on the semantic repre-sentation from the word level and sentence level,respectively.Based on the observation that different contexts will reveal the different aspects of word semantic,we propose to leverage Word2Vec to train word embedding with different contexts,so that the word semantic can be represented more comprehensively.Then,we design a general and open framework to handle different word contexts with attention mechanism for the specific applications,aiming at comprehensive representation and better utilization of word semantic.The experimental results demonstrate that our proposed method has the capability to improve the accuracy on textual entailment recognition.Moreover,in order to overcome the context missing problem during sentence semantic modeling,we propose to use multi-modal(image)as the context of sentences.By leveraging im-age attention to construct the comprehensive representation of sentence and sentence attention to model the interaction during sentences,our proposed CADAN is capable of achieving comprehensive sentence semantic modeling and accurate natural language inference recognition.Finally,extensively experiments on large-scale dataset demon-strate the superiority of our proposed methods.Secondly,we propose a multi-level text representation method and a multi-level image context utilization method.Multi-modal data can be helpful for enriching the context information of sentences.However,it has various forms and connects with natural language at different levels,which brings many challenging for multi-modal modeling.To this end,we propose Image-Enhanced Multi-Level Sentence Representa-tion Net(IEMLRN)in a natural language representation perspective.To be specific,we leverage image information to enhance the sentence semantic representation at word-level,phrase-level,and sentence-level,respectively.Along this line,IEMLRN can gen-erate better sentence semantic,which is helpful for improving the accuracy of natural language inference.Moreover,there is a big gap between visual information of images and abstract semantic of natural language.In order to narrow down this gap,we make a step further of image utilization and design a novel adaptive image feature extrac-tor.It makes full use of image captioning sentence to extract abstract image semantic features.Then,we propose the Multi-Level Image-Enhanced Sentence Representation Net(MIESR),a novel architecture that integrates coarse-grained pre-trained image vi-sual features and fine-grained image adaptive features for better image utilization and sentence semantic representation.Thus,the model can generate sentence representa-tion more comprehensively and accurately.Extensive experimental results on several natural language inference datasets prove that our proposed methods are effective for enhancing sentence representation with the help of image information.Finally,we propose a context-aware dynamic reading method for sentence repre-sentation.Natural language is the main communication method for humans.Therefore,human behaviors are capable of guiding the research of natural language representation.To this end,we intend to explore human reading behaviors and propose Dynamic Re-read Network(DRr-Net)for better sentence representation.To be specific,DRr-Net has the capability to select the most important word at each step with the consideration of all learned information,so that the dynamically changing important words among sentence can be captured precisely.Then,we repeat this process for precisely sentence seman-tic understanding and representing.Moreover,since the natural language semantic is highly dependent on surrounding contexts,the lack of contexts will lead to ambigu-ous and inaccuracy representation of sentence semantic.Thus,we extend DRr-Net to Dynamic Local-aware Re-read Network(DLaRr-Net),in which we not only select the most important word at each reading step,but also extract the suitable local context for the most important word.By integrating these two parts,the sentence semantic can be represented more comprehensively,which is in favor of enhancing the understanding and representing sentences.Extensive experimental results on two different sentence semantic matching tasks demonstrate that our proposed methods are capable of cap-turing the dynamically changing important parts among sentences and achieve better performance on sentence semantic understanding and representing.
Keywords/Search Tags:Natural Language, Semantic Representation, Context information, Multi-modal modeling, Attention Mechanism, Dynamic Reading
PDF Full Text Request
Related items