Research On Chinese Syntactic Parsing Based On Lexicalized Statistical Model

Posted on:2007-02-21

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H L Cao

Full Text:PDF

GTID:1118360185968054

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Automatic natural language parsing is a fundamental problem to many natural language processing tasks. The task of parsing is to design software that can identify syntactic components in a sentence automatically. The performance of many realistic applications such as machine translation, information extraction would be improved if the right syntactic structure was available. And on the other hand, the language is the carrier of human thinking. Research on language parsing is helpful to discover the essence of the language. Therefore, Its research is of great theoretical importance as well as philosophical significance.Comparing with other languages such as Englsih, automatic parsing of Chinese has its own difficulty. Currently, automatic Chinese parsing technology can not satisfy the requirement of realistic applications. This dissertation begins with a basic problem of ambiguity resolution in automatic Chinese parsing, so as to frame an integrated statistical model of Chinese parsing. In detail, this dissertation has conducted the following researches:1. Chinese part of speech tagging is the basis of Chinese information processing. We proposed a method based on bilexical co-occurrences to tag Chinese text. The standard hidden Markov model assumes the transition between states (part of speech) is independent of the observation (word) sequence and the generation of a new observation is independent of other observations. In fact, Chinese text does not satisfy this assumption. Based on hidden Markov model, the effect of the words in the context on the decision of part of speech is also considered. The disambiguation ability of the model is improved. We evaluate the proposed model on China Daily corpus. The tagging accuracy is 99.09% on close test set and 96.37% on open test set.2. The development of Penn Chinese Treebank spurred the research of Chinese parsing. We present the first-ever result of applying the well-known head-driven model to the newly available CTB5.0. Compared with previous works on CTB, we achieve more promising result and narrow the performance gap between Chinese parsing and English parsing. We evaluate the parser on the...

Keywords/Search Tags:

Chinese parsing, Lexicalized statistical model, PCFG

PDF Full Text Request

Related items

1	Chinese Syntax Parsing And Its Application To Chinese-English Statistical Machine Translation
2	Researches On PCFG-Based Parsing Method For Chinese Language
3	Towards efficient statistical parsing using lexicalized grammatical information
4	The Study On Data Augmentation In Chinese Parsing
5	Semantic Structure Identification Based On PCFG-HDSM Model
6	Research On Reranking Technology For Chinese Syntactic Parsing
7	Research On Chinese Shallow Parsing Based On Statistical Language Model
8	A Study Of Chinese-Vietnamese Statistical Machine Translation Methods That Combines Language Differences
9	Research Of Chinese Stentence Skeleton Parsing Base On Statistical Model
10	Semantic parsing using lexicalized well-founded grammars