Font Size: a A A

Research On Identifying Chinese Coordinate Structures

Posted on:2017-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhouFull Text:PDF
GTID:2308330485462281Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Syntactic Parsing is one of the fundamental tasks of Natural Language Process-ing(NLP),and many other NLP task are relying on syntactic parsing. However, in recent years, it becomes more and more difficult to improve the quality of syntactic parsing. One of the reasons is that there are plenty coordinate structures in natural language sentences. A coordinate structure is a complex, frequently occurring type of syntactic structure which links together two or more elements, known as conjuncts or conjoins. Due to the difficulty of identifying the boundary of each conjunct, resolving coordinate structures is one of the most crucial problems in Chinese parsing task.It’s difficulty to identify coordinate structures. First, there are no clear definition about coordinate structures, which makes this problem an incomplete problem. Sec-ond, there are many different patterns for coordinate structures. For example, there are coordinate relations between two (or more) words, two (or more) phrases and two (or more) clauses. In addition, there are also different structures for coordinate structures. For example, coordinate structures can be flat or nested. At last, it’s not easy to model a coordinate structure. The traditional feature templates in parsing can not describe co-ordinate structures correctly. Using these templates can only identify a few coordinate structures.If we can identify these coordinate structures in advance, we can send these in-formation into a parser and improve the performance of this parser. This work aims at identify these coordinate structures. We treat this identifying task as an indepen-dent task, separating from parsing task. This thesis focus on the problem of resolving coordinate structures in Chinese. Our work includes the following aspects:1. We clearly define the coordinate structures based on the bracketing guidelines of CTB. According to these definitions, we design some extracting rules to extract coordinate structures and establish standard data set. At the same time, we propose a new context-free grammar to describe these coordinate structures. This grammar is designed especially for Chinese. It not only can cover all of the nested structures, but also can cover all the special cases in Chinese. Using this grammar, we can represent coordinate structures as a tree. In this way, we can use the traditional parsing technology to resolve this problem.2. Based on the Shift-Reduce searching method, we add some constraints in the pars-ing process according to the grammar in order to reduce searching space. Then we design a new feature template which is based on the embedding to evaluate coordi-nate structures. Experiment results show our new feature templates can achieve a great improvement of identifying coordinate structures.3. We also propose a new two-step searching framework to resolve coordinate struc-tures. We break this task into two different steps. In each step, we model the co-ordinate structures in different aspects. In the first step, we modify the traditional recurrent neural network to describe the validity of phrases. Based on Shift-Reduce searching technology, we use this neural network to search for coordination trees. In the second step, we propose a new neural architecture which can describe both validity and similarity of phrases. Using this new architecture, we choose the best coordinate structures from coordination trees.
Keywords/Search Tags:Coordinate Structures, Syntactic Parsing, Neural Network, Validity of Phrases, Similarity of phrases
PDF Full Text Request
Related items