Font Size: a A A

Research On Technology Of Chinese Dependency Parsing

Posted on:2016-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z GuoFull Text:PDF
GTID:2308330467972492Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Syntactic analysis is a key problem in natural language processing (NLP). It connects lexical analysis at the basic level and semantic analysis at the high level. The improvement of the accuracy of syntactic analysis will bring about a positive impact on NLP applications, such as machine translation, information retrieval and automatic question answering system. Dependency grammar gradually got more and more attention of the researchers and has been widely used in syntactic analysis due to its characteristics:(1) representation simplicity,(2) easy to annotate,(3) easy to make use of.The goal of dependency parsing is to derive the dependency tree for a given sentence according to dependency grammar. Currently, the key problem of dependency parsing includes improving the accuracy of dependency parsing, overcoming limitation of the corpus size and quality, and enhancing the ability of domain adaptation. It is more urgent to Chinese dependency parsing due to a late start and more difficulties in Chinese analysis. This paper makes the following contributions to improve the accuracy of Chinese dependency parsing:1. We implement several Chinese dependency parsing systems. This work follows and improves the existing dependency parsing methods and implements three Chinese dependency parsing systems:1) multiple transition dependency parsing system based on Shift-Reduce algorithm,2) graph-based semi-supervised dependency parsing system using subtrees from auto-parsed data,3) transition-based semi-supervised dependency parsing system using semantic classes from HowNet. Through multiple evaluation experiments, we analyze the characteristics and difficulties of Chinese dependency parsing, explore the effect of raw corpus size on semi-supervised methods and investigate the significance of HowNet semantic knowledge for Chinese dependency parsing.2. We propose and implement a novel character-level joint model for Chinese word segmentation, POS tagging and dependency parsing. We transform the traditional word-based dependency tree into character-based dependency tree using the internal structure of words and implement a novel character-level joint model for Chinese word segmentation, POS tagging and dependency parsing, based on the transition framework. In this joint model, the basic processing unit of the three tasks is Chinese character. This make the design of processing framework and the implementation of joint model become more concise, smooth and reasonable. Our joint model achieves better performance in all tasks than the pipeline models and the improvements on POS tagging and dependency parsing are more significant (0.86%and1.79%).3. We propose and implement a novel semi-supervised joint model for Chinese word segmentation, POS tagging and dependency parsing. The joint model exploits n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieves accuracies of98.31%,94.84%and81.71%for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pure joint model of the three tasks by0.79%,0.91%and2.16%, and outperforms the pipeline model of the three tasks by0.92%,1.77%and3.95%, respectively. Especially, the F-measure of word segmentation and POS tagging achieves the best result compared with those reported until now.
Keywords/Search Tags:Dependency parsing, Joint model, Semi-supervised learning, Chineseword segmentation and POS tagging, Natural language processing
PDF Full Text Request
Related items