Font Size: a A A

Research On Chinese Syntactic Parsing Based On Lexicalized Statistical Model

Posted on:2007-02-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:H L CaoFull Text:PDF
GTID:1118360185968054Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Automatic natural language parsing is a fundamental problem to many natural language processing tasks. The task of parsing is to design software that can identify syntactic components in a sentence automatically. The performance of many realistic applications such as machine translation, information extraction would be improved if the right syntactic structure was available. And on the other hand, the language is the carrier of human thinking. Research on language parsing is helpful to discover the essence of the language. Therefore, Its research is of great theoretical importance as well as philosophical significance.Comparing with other languages such as Englsih, automatic parsing of Chinese has its own difficulty. Currently, automatic Chinese parsing technology can not satisfy the requirement of realistic applications. This dissertation begins with a basic problem of ambiguity resolution in automatic Chinese parsing, so as to frame an integrated statistical model of Chinese parsing. In detail, this dissertation has conducted the following researches:1. Chinese part of speech tagging is the basis of Chinese information processing. We proposed a method based on bilexical co-occurrences to tag Chinese text. The standard hidden Markov model assumes the transition between states (part of speech) is independent of the observation (word) sequence and the generation of a new observation is independent of other observations. In fact, Chinese text does not satisfy this assumption. Based on hidden Markov model, the effect of the words in the context on the decision of part of speech is also considered. The disambiguation ability of the model is improved. We evaluate the proposed model on China Daily corpus. The tagging accuracy is 99.09% on close test set and 96.37% on open test set.2. The development of Penn Chinese Treebank spurred the research of Chinese parsing. We present the first-ever result of applying the well-known head-driven model to the newly available CTB5.0. Compared with previous works on CTB, we achieve more promising result and narrow the performance gap between Chinese parsing and English parsing. We evaluate the parser on the...
Keywords/Search Tags:Chinese parsing, Lexicalized statistical model, PCFG
PDF Full Text Request
Related items