Font Size: a A A

Research And Implementation On Dependency Parsing Based On Chinese Treebank

Posted on:2015-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:R XiaoFull Text:PDF
GTID:2268330431954223Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Parsing based on treebank and machine learning is a central issue of current naturallanguage processing. In the field of syntactic analysis, an annotated corpus is usuallyconsidered as a resource for syntactic parser to obtain language knowledge and as astandard for evaluating dependency parsing results. Researchers tend to improve theperformance of dependency parser by means of improving algorithm, features and machinelearning methods. Meanwhile, the accuracy of dependency parsing can be also increasedfrom the aspect of linguistic annotation.Dependency Treebank of Harbin Institute of Technology is adopted for research inthis paper, and this treebank is modified to improve parsing performance. MaltParser isadopted as an experimental tool for dependency parsing. Libsvm is used for training andthe format of training set is CoNLL format. Therefore, Java programming language andDOM4J are used to convert the XML format into CoNLL. Secondly, Verb class issubdivided separately according to Standard of POS Tag of Chinese Academy of Sciencesand Standard of POS Tag of Contemporary Chinese for CIP. Thirdly, to observe the effectof noun incorporation on dependency parsing, eight kinds of noun classes, like person name,geographical name, etc, are merged into one class. Finally, The part-of-speech of punctuation“、” was labeled as conjunction because it generally acts as a coordinating conjunction.Originate treebank and modified ones are used as training-test sets separately. Resultshows, the treebank of verb subdivision by Standard of POS Tag of Contemporary Chinesefor CIP performs best. We use this treebank as a training-test set, and obtain the higherdependency accuracy. On the basis of that treebank, punctuation “、” is labeled asconjunction and dependency accuracy is further increased. Noun incorporation has littleeffect on dependency paring but can decrease machine leaning time on the same condition.
Keywords/Search Tags:dependency parsing, Chinese treebank, verb subdivision, nounincorporation
PDF Full Text Request
Related items