Research On Transfer Learning For Chinese Sequence Labeling Tasks

Posted on:2022-08-17

Degree:Master

Type:Thesis

Country:China

Candidate:M Z Li

Full Text:PDF

GTID:2558306914464034

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Chinese sequence labeling tasks are the bedrocks in the natural language processing,and it plays an important role in the downstream tasks,e.g.Chinese word segmentation,part-of-speech tagging and named entity recognition.Accurately segmenting the Chinese words,recognizing the part-of-speech and extracting named entity recognition is of beneficial to information extraction,question answering and de-identification tasks.Particularly,it is more significant to process text information with the development of the Internet technology and increasing of the text information in various domains.With stronger deep learning model and fruitful labeled corpora,previous methods have made great progress especially in those datasets with sufficient labeled information.However,it is still expensive and exhausted to label terrific amounts of texts,notably in some highly professional domains like medical science.And some approaches indicate empirically that with little labeled data,deep models may have a poor performance.In this way,transferring knowledge from enough labeled data to data that lacks labeled information termed domain adaptation,is a topic with great significance and broad application prospects.In this thesis paper,Chinese word segmentation,part-of-speech tagging and named entity recognition is regarded as core tasks.The purpose is to investigate domain adaptation algorithms for Chinese sequence labeling tasks.The paper mainly focuses on instance-based and feature-based domain adaptation for Chinese sequence labeling tasks through studying the measurement of disparity between data in different domains at a fine-grained or coarse-grained level.The main contributions of the paper are as follows:The fundamental research on Chinese word segmentation task is first conducted,which aims to figure out some challenges in the sequence labeling tasks especially for Chinese word segmentation.The first section proposes a capsules-based neural network for Chinese word segmentation,where a sliding window is applied to handle sequence labeling problem.This capsules-based structure can capture more contextualized information.This preliminary exploration helps the further domain adaptation algorithms for sequence labeling tasks.Next,the domain adaptation algorithm is further explored.Traditional domain discrepancy measurement has achieved good results in computer vision.However,simply employing this measurement and then tune a deep feature extractor will bring negative transfer,that is,the knowledge and information transferred will reverse the desired goal.Hence,second section integrates attention mechanism with traditional measurement in a deep model,where different weights for samples in source domain are automatically computed to suppress negative transfer.In addition,in sequence labeling tasks,each element needs to be identified,a fine-grained model is specifically designed for sequence labeling tasks.The experimental results show that combination of fine-grained and coarse-grained manner has a good effect for Chinese sequence labeling task.Finally,on the premise of ensuring semantic information,the paper constructs fine-grained element samples containing context knowledge.With these samples,a fine-grained instance-based domain adaptation method can be developed.By replacing traditional measurements with adversarial learning and utilizing reinforcement learning to select samples,a selective transfer models is designed for Chinese sequence labeling tasks.The experimental performance on several datasets show that the domain adaptation of fine-grained samples has great significance for Chinese sequence labeling task.Meanwhile,it also proves that practicability in sequence labeling tasks at the element level.

Keywords/Search Tags:

Chinese sequence labeling task, transfer learning, domain adaptation, distribution discrepancy measurement

PDF Full Text Request

Related items

1	Research On The Method Of Unsupervised Transfer Learning Based On Feature Distribution Discrepancy Adaptation
2	Iterative Classified Mean Discrepancy In Transfer Learning
3	Research Of Unsupervised Domain Adaptation Method Based Discrepancy In Category Distribution
4	Unsupervised Domain Adaptation Based On Minimizing Maximum Mean Discrepancy
5	On Methods Of Projection And Graph Construction For Unsupervised Domain Adaptation
6	Application Of Unsupervised Domain Adaptation In Image Recognition Technology
7	Research On Domain Adaptation Algorithm Based On Subspace Alignment Methods
8	Research On Chinese Word Segmentation Sequence Labeling Method Based On Multi-task Learning
9	Research On Unsupervised Domain Adaptation Methods Combined With Pseudolabel Improvement
10	Cross-domain Adaptive Algorithm Without Domain Information