Font Size: a A A

Research On Chinese Phrase Structure Syntactic Parsing Based On CVG

Posted on:2016-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:J Y LiFull Text:PDF
GTID:2308330461450974Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The basic task of syntactic parsing is to determine the syntactic structure of the sentence. Because the natural language is very complex, syntactic structure have a lot of ambiguities. Resolving ambiguity need to introduce a lot of information. There are two ways syntactic analysis, phrase structure parsing and dependency parsing. This paper studies the phrase structure parsing.Compositional Vector Grammar(CVG) model can capture the required information parsing. This paper introduce the CVG model and improve the CVG model, so that it can have a better performance in the Chinese parsing. CVG model combines PCFG with a syntactically untied recursive neural network( SU-RNN). CVG model use PCFG model to predict the structure and create candidate trees, use SU-RNN capture fine-grained syntactic and compositional-semantic information on phrases and words. This information can help in cases where syntactic ambiguity can only be resolved with semantic information, re-rank candidate trees. In this paper, the main work is as follows:1) This paper will attempt to take advantage of CVG for Chinese syntactic analysis. This paper introduce CVG model and its key technologies. Stanford Parser use CTB8.0 training PCFG model as a base model and train a CVG model. The papers present CVG model performance on Chinese syntax analysis and performance comparison CVG and PCFG model.2) CVG model have some problems, this paper gives some improvements to solve these problems. a) For polysemy, through integration of POS information, the "word" and "speech" as a whole is trained to solve the problem of polysemy. b) For unknown word problems, it is divided into two categories. Each class has a different solution. The first category is the word unseen in the corpus base, vectors of these words can be substituted with a structure vector. The second is the word seen in the corpus base, but its corresponding part ofspeech does not exist. For these words, first, find a vector corresponding to the structure, and then the sub-tree score of pos parent node were punished. If you do not find the corresponding vector is used to replace the zero vector. c) For the problem of newborn nodes and the original node difficult to distinguish caused by binary tree parent, for newborn node’s type temporarily add flag to resolve. d) For the problem of node score calculation exist redundancy in CVG model, this paper proposed the score function parameters dependent from the original sub-node type to the current node type. 3) The improvements applied to the CVG model and experiments on the CTB8.0. By Successive increases Way, to verify the validity of four improvement measures. Experiments show, improvement measures will help improve the performance of the parser. There is a 0.92 percent increase compared to CVG model in the development set. By comparing the error type of the tree, again validate the improvements is effective and to further enhance the syntactic analysis provides preparatory work.
Keywords/Search Tags:Phrase Structure Parsing, re-rank, Compositional Vector Grammar, recursive neural network, SU-RNN
PDF Full Text Request
Related items