Font Size: a A A

Research On Neural Network Based Methods Of Chinese Word Segmentation For Domain Adaptation

Posted on:2018-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:J L WuFull Text:PDF
GTID:2428330623450980Subject:Software engineering
Abstract/Summary:PDF Full Text Request
For a long time,Chinese word segmentation has received extensive attention as the basic task of Chinese information processing.The segmentation method based on supervised learning algorithms are relatively mature,but often does not perform well in cross-domain segmentation.To increase domain adaptability,researchers often rely on hand-crafted features to introduce domain information.In order to reduce the dependence on feature engineering,in this paper,based on neural network model,a domain segmentation method is proposed,which is suitable for three resource scenarios based on neural network model.In the first two scenarios,the model and its parameter visibility of a general word segmenter is used as a resource constraint condition,and the third scenario is based on the unlabeled corpus in the target domain(without the general word segmenter).The details are as follows:(1)For having general segmenter and knowing its model and parameters,two methods are proposed to transform the general segmentation model into the domain one: prior parameter models and domain transfer regularization.The priori parameter model allows the domain model to "inherit" some or all of the parameters of the general model,giving it a better starting point for training.Domain transfer regularization can alleviate the problem of training bias prone to occur on small-scale data.(2)For having general segmenter but not knowing its model and parameters,a domain segmentation method based on neural network corrector is proposed.This method can automatically learn the correction pattern from the segmentation errors in the target domain,and then transform the general segmentation results into the domain-oriented segmentation results from the predicted corrective action sequence.Compared with the existing methods based on the correction,the proposed method does not require feature engineering,and is superior to the existing methods in terms of word segmentation performance and robustness.(3)For how to use large-scale unlabeled corpus in domain,two methods for neural network word segmentation model are proposed: pre-training initialization of embeddings and language model feature fusion.The former method gives the word segmentation model the character representation which is more suitable for the target domain,and the latter method takes the implicit activation value of the neural network language model as the augmented feature of the segmentation model.In addition,gating mechanism is introduced in the process of feature fusion to dynamically adjust the contribution weight of the language model.Experiments show that the two methods can further improve the domain segmentation performance.The future work of the project will continue to promote the comprehensive utilization of resources in the three scenarios above and the measurement of transferability among domains.
Keywords/Search Tags:Chinese Word Segmentation, Domain Adaptation, Neural Network, Sequence Labelling
PDF Full Text Request
Related items