Font Size: a A A

Research On Key Technologies Of Text Semantic Matching Based On Structural Features And Multi-layer Information Interaction

Posted on:2023-09-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:L QiFull Text:PDF
GTID:1528306839479954Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text semantic matching is an essential research direction in natural language processing,which purpose is to judge whether two texts conform to the given semantic relationship.Text semantic matching includes many downstream tasks,such as natural language inference,paraphrase identification,answer sentence selection,etc.Different tasks have different semantic relationships to match.However,no matter what kind of semantic relationship it is,judging whether there is a match between texts relies on:(1)How to represent the semantics of texts?(2)How to judge the semantic relationship between two texts?For the above two core points,researchers divide the text semantic matching task into two directions: representation-based text semantic matching and interaction-based text semantic matching.The former mainly studies how to design better text encoders to represent texts,while the latter focuses more on the determination process of the relationship between two texts.Our thesis will study the representation-based text semantic matching based on text structures and explore the interaction-based text semantic matching based on historical information generated by the multiple interaction process.In addition,for nonplain text format heterogeneous documents(e.g.,PDFs,web pages,scanned documents,etc.),existing natural language processing applications such as open-domain question answering systems usually need to design specific content extraction algorithms according to different document formats in advance,and then extract the text content from them for subsequent analysis and processing.This undoubtedly consumes a lot of manpower and material resources,and loses valuable layout and visual information.Therefore,our thesis describes heterogeneous documents through document images from the visual perspective,and studies cross-modal text semantic matching between text and document images and its applications in question answering.In specific,our thesis researches the following aspects:1.Representation-based text semantic matching research based on textual linear structures.In representation-based text semantic matching studies,existing studies usually replace convolutional neural networks and recurrent neural networks with self attention networks to build stronger and more parallelizable text encoders.However,due to the positional independence of the self attention mechanism,self attention networks are weak in modeling text structures and cannot utilize structure features to capture semantic dependencies more accurately.In our thesis,we first analyze the text linear structure,which is considered to be described by three main aspects: absolute position of words,relative distance between words,and word order.On this basis,our thesis proposes a Bidirectional Linear Positional aware Transformer.It organically integrates the above three structural features through the Absolute-position aware Relative Position Encoding and Bidirectional Masking strategy to jointly model the linear structure and use the linear structure to model text order,key information,and local dependency within texts more precisely.Experiments on two text semantic matching tasks,natural language inference and paraphrase recognition,validate the effectiveness of the approach.2.Representation-based text semantic matching research based on mixed text structures.There are many kinds of text structure features,and different structural features complement each other.Therefore,it is impossible to model the text structure from a single perspective comprehensively.In order to model the text structure from different perspectives simultaneously,our thesis takes advantage of the multi-perspective modeling advantage of the multi-head attention mechanism,and proposes the Multi-mask Multi-head Attention mechanism.On this basis,our thesis proposes a Mixed Structural Features guided Transformer,which introduces word order,word relative distance,and word dependency distance into Transformer simultaneously.With the help of multiple structural features,our model can simultaneously model the linear structure and semantic dependency structure of text,and can better capture both local and non-local semantic dependencies within text.Experimental results on natural language inference and paraphrase identification tasks demonstrate that this method improves further than the Bidirectional Linear Positional aware Transformer that only models the text linear structure.3.Interaction-based Text Semantic Matching Research based on the History of Multi-layer Interaction Network.In the research of interaction-based text semantic matching,existing studies usually apply multiple interactive matching to better judge the semantic relationship between texts.In this process,the historical representation information and historical interaction information generated by the pre-order matching will guild the post-order matching.However,existing research commonly only transmits and utilizes one kind of information and cannot make good use of all historical information.In order to better transmit and utilize these two types of historical information,our thesis proposes the Full Information Transmission Network,which applies a novel original-average mixed connection to transmit the representation information effectively and utilizes the memory-based attention mechanism to keep and transmit the interactive information through a global interaction matrix.Experiments on natural language inference and paraphrase identification tasks demonstrate that our approach outperforms other nonpretrained models and pre-trained models with similar scales under resource-constrained conditions.4.Cross-modal Text Semantic Matching Research based on Layout Structures.For the widely existing heterogeneous documents in non-plain text format,our thesis extends from textual semantic matching research to cross-modal textual semantic matching research between text and document images.In Open-domain Question Answering where text semantic matching is widely applied,existing systems need to design specific content extraction methods for heterogeneous documents in different formats to pre-extract the text content as the information source.This not only greatly increases the construction cost of scalable open-domain question answering systems,but also loses all the visual and layout information in the original document.To this end,our thesis proposes an Opendomain Document Visual Question Answering task,which directly takes a collection of document images converted from heterogeneous documents as the information source to answer questions.Besides,our thesis combines all the previous research contents to study cross-modal text semantic matching between text and document images,proposes the Full Information Transmission Network with Layout and Text Linear Structures,and applies it to the re-ranking stage in the Open-domain Document Visual Question Answering task to match the question and the retrieved candidate document images.Experiments on the first Chinese open-domain document visual question answering dataset proposed in our thesis verify the effectiveness of the model.
Keywords/Search Tags:natural language processing, text semantic matching, text structural features, attention mechanism, cross-modal application
PDF Full Text Request
Related items