Font Size: a A A

Research And Implement On Chinese Dependency Parsing

Posted on:2010-12-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:W L YaoFull Text:PDF
GTID:1118330332464974Subject:Detection and processing of marine information
Abstract/Summary:PDF Full Text Request
Parsing is one of the important issues of natural language processing. The task of parsing is to derive the syntactic structures of sentence according to a certain grammar. The improvement of parsing will give an enormous impetus to natural language processing applications such as machine translation, information retrieval, information extraction, text classification and automatic summarization.Parsing depends on grammatical theories.Dependency grammar is becoming a hot attention to the researchers gradually. Dependency parsing technology is being continuously developed and improved, and gets better results in English and other Indo-European languages. However, dependency grammar has not been researched fully for Chinese. Chinese dependency parsers do not perform very well. So the methods of Chinese dependency parsing are investigated based on statistical theories and the unique characteristics of Chinese grammar in this thesis. It takes deterministic dependency parsing to solve some problems of Chinese parsing. The main contribution and innovation points of this thesis are as follows:1.A method of Chinese long sentence dependency parsing is proposed and implemented in this thesis.The method aims to the difficulties for the analysis of Chinese complex long sentences. According to the parsing method of gradation, this thesis divides a sentence into two sub-sentences to decline the complexity of the sentences and improve the dependency accuracy. This thesis presents a non-greedy Chinese deterministic dependency parsing considering long distance. This method divides a sentence into two sub-sentences by the information of root. This thesis selects SVM to construct an efficient root searcher. The root searcher marks the root of a sentence, parses the two sub-sentences separately to get sub trees, and then merges the two sub-trees into a whole one. Experiments show that the root searcher has high accuracy, and the proposed parser achieves significant improvement on dependency accuracy and root accuracy. By this way, the complexity of the sentences is declined in some way, thus, the parsing accuracy is increased obviously.2. A method named Two-Stage Parsing for sub-sentences is proposed and implemented in this thesis. The parsing method is to solve the problem of Early-reduce caused by deterministic algorithm. In Chinese, the problem occurs mainly in sentences with verbs or prepositions whose children are on the right side. Two-stage parsing not only pays attention to the verbs, but also to the prepositions which easily cause the problem of Early-reduce in Chinese. So it solved the problem of VP Early-reduce efficiently by taking effective feature, two-stage parsing and feature reusing. At the same time this method can weaken the greediness of deterministic algorithm. A bidirectional parsing method for sub-sentences is proposed and implemented in this thesis. According to the character of projective of Chinese and the trait of sentences after segmentation, this thesis proposes a bidirectional parsing method for sub-sentences of using both the forward and backward directions. During the process of parsing, it takes both forward and backward parsing direction. Experiments show the two-stage paring can get higher dependency accuracy. Experiments also show that the parsing accuracy rate is improved by using bidirectional parsing method for sub-sentences.3.A parsing method based-on automatic identification of prepositional phrases right-boundary is proposed and implemented in this thesis. According the characteristic and phenomena of prepositional phrases, it can identify the right-boundary of prepositional phrases automatically.Then it focus on the defect of deterministic parsing algorithm which is weak to identify the long distance dependency. It can reduce the errors of long distance dependency of prepositional phrase caused by early decision-making. Experiments show that this method is effective to Chinese long distance dependency parsing of prepositional phrase.
Keywords/Search Tags:Dependency Parsing, Deterministic, Long-distance Dependency, Two-Stage Parsing, Prepositional Phrase
PDF Full Text Request
Related items