Chinese sequence labeling tasks are the bedrocks in the natural language processing,and it plays an important role in the downstream tasks,e.g.Chinese word segmentation,part-of-speech tagging and named entity recognition.Accurately segmenting the Chinese words,recognizing the part-of-speech and extracting named entity recognition is of beneficial to information extraction,question answering and de-identification tasks.Particularly,it is more significant to process text information with the development of the Internet technology and increasing of the text information in various domains.With stronger deep learning model and fruitful labeled corpora,previous methods have made great progress especially in those datasets with sufficient labeled information.However,it is still expensive and exhausted to label terrific amounts of texts,notably in some highly professional domains like medical science.And some approaches indicate empirically that with little labeled data,deep models may have a poor performance.In this way,transferring knowledge from enough labeled data to data that lacks labeled information termed domain adaptation,is a topic with great significance and broad application prospects.In this thesis paper,Chinese word segmentation,part-of-speech tagging and named entity recognition is regarded as core tasks.The purpose is to investigate domain adaptation algorithms for Chinese sequence labeling tasks.The paper mainly focuses on instance-based and feature-based domain adaptation for Chinese sequence labeling tasks through studying the measurement of disparity between data in different domains at a fine-grained or coarse-grained level.The main contributions of the paper are as follows:The fundamental research on Chinese word segmentation task is first conducted,which aims to figure out some challenges in the sequence labeling tasks especially for Chinese word segmentation.The first section proposes a capsules-based neural network for Chinese word segmentation,where a sliding window is applied to handle sequence labeling problem.This capsules-based structure can capture more contextualized information.This preliminary exploration helps the further domain adaptation algorithms for sequence labeling tasks.Next,the domain adaptation algorithm is further explored.Traditional domain discrepancy measurement has achieved good results in computer vision.However,simply employing this measurement and then tune a deep feature extractor will bring negative transfer,that is,the knowledge and information transferred will reverse the desired goal.Hence,second section integrates attention mechanism with traditional measurement in a deep model,where different weights for samples in source domain are automatically computed to suppress negative transfer.In addition,in sequence labeling tasks,each element needs to be identified,a fine-grained model is specifically designed for sequence labeling tasks.The experimental results show that combination of fine-grained and coarse-grained manner has a good effect for Chinese sequence labeling task.Finally,on the premise of ensuring semantic information,the paper constructs fine-grained element samples containing context knowledge.With these samples,a fine-grained instance-based domain adaptation method can be developed.By replacing traditional measurements with adversarial learning and utilizing reinforcement learning to select samples,a selective transfer models is designed for Chinese sequence labeling tasks.The experimental performance on several datasets show that the domain adaptation of fine-grained samples has great significance for Chinese sequence labeling task.Meanwhile,it also proves that practicability in sequence labeling tasks at the element level. |