Font Size: a A A

Research On Cross-domain Text Sequence Annotation Based On Transfer Learnin

Posted on:2024-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:W H ZhangFull Text:PDF
GTID:2568307106484074Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Text sequence tagging is an important research task in natural language understanding.The main purpose of text sequence tagging is to label each word or phrase in the text to represent the part of speech,subject,entity,and other information of the word or phrase.It is a pre-task of a human-computer dialogue system and knowledge graph.Traditional text sequence annotation method has achieved high recognition accuracy,but its model relies on large-scale annotation data,which is often difficult to obtain in specific fields such as military and medical fields,resulting in the existing model being difficult to directly apply to resource-deficient fields.In recent years,some progress has been made in applying the model to cross-domain text sequence annotation in the data resource-deficient domain.However,there are still some problems in the current research on this aspect,which can be roughly divided into two aspects: "how to migrate" and "what to migrate".Specifically,"How to transfer" mainly studies how to transfer common knowledge between different fields into scarce fields,while "Transfer What" focuses on how to accurately identify common knowledge between fields.In order to solve these problems,this paper will study cross-domain text sequence annotation from the perspective of transfer learning,focusing on how to improve the performance of cross-domain text sequence annotation with transfer learning.The specific content is as follows:(1)The study of feature representation of text sequences proposes to improve the performance of the target task by using the stability of structured representation invariant across domains in response to the current methods that do not utilize the structured information in the feature representation.In transfer learning,the research on the feature representation of text sequences focuses on how to realize the cross-domain transfer of features,which focuses on discovering the common feature representation and transferring it.For the text sequence problem,the transferable feature representation should contain the features of related elements between domains and have wide applicability to achieve cross-domain migration.To solve this problem,a Multi-Level Structured Alignment(MSA)mechanism is designed in this paper,and a cross-domain text sequence annotation model is proposed based on multi-layer structured semantic knowledge enhancement to learn and transfer domain generic text feature representations.In this model,the cross-domain alignment problem is transformed into the graph matching problem,and better migration effect can be obtained by transferring the similar features of nodes and edges.At the same time,structured knowledge of semantic and contextual features is obtained at the embedded layer and the hidden layer respectively,and the crossdomain invariant stability of structured knowledge is used to promote the learning and transfer of different levels of feature generic knowledge.Experiments were carried out on common data sets and special cross-domain named entity recognition data.The results show that the proposed method can learn and transfer the underlying feature knowledge of text sequences well,and obtain the feature representation that can represent the original semantic information better,which verifies the effectiveness of the proposed method.(2)The study of the migration mechanism of text sequences addresses the problem of spurious correlation due to data differences between domains and proposes to use the causal relationship in the feature representation to propose causal constraints to eliminate spurious features and migrate only the common feature representation between domains.The research on migration mechanisms is mainly to find a better way to transfer the common knowledge to the target domain for the related tasks under the environment of low resources,to improve the performance of the target task.At present,most research methods transfer features through the distance between constraint domains,but whether the migrated features are useful to the target domain remains to be verified.For example,the transfer of features representing the same but different meanings will reduce the performance of related tasks.After research,it is found that although the representation of these features is the same,their internal causal relationship is different.Based on this,a Causal Inference(CI)was integrated into a multi-layer structured inference mechanism,and a cross-domain text sequence labeling model based on semantic knowledge enhancement of causal structure was designed.In this way,false feature representations were eliminated through causal relationships to reduce negative transfer.In the model,the causal inference is used to constrain the generation and learning of features,and the causal relationship between features is used to constrain the feature representation of transfer,to improve the quality of the model transfer.The results of the model are better than those of the existing methods in various fields by experiments on special cross-domain named entity recognition data.In summary,the use of structured information and causality in feature representation can well alleviate the problem of cross-domain text sequence labeling,and the comparison with existing methods on relevant data sets has significantly improved,indicating the feasibility of the proposed ideas and methods as well as the effectiveness of the proposed model.
Keywords/Search Tags:Transfer Learning, Cross-Domain Text Sequence Labeling, Cross-Domain Transfer, Structured Alignment, Causal Inference
PDF Full Text Request
Related items