Font Size: a A A

Research On Chunking Based Full Parsing

Posted on:2013-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:H B DingFull Text:PDF
GTID:2298330467974654Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Syntactic parsing plays an important role in Natural Language Processing (NLP), syntactic parsing’s task is to figure out how the sentence is constructed and how the composed parts are related. Given a sentence, a parser is to analyze what phrases build up this sentence and how these phrases are built up with words. As a basic task in NLP field, many other NLP applications depend on syntactic parsing, such as Semantic Role Labeling (SRL), Statistical Machine Translation (SMT), and Information Extraction (IE), etc.Currently, many state-of-the-art parsers could achieve high performance, Berkeley Parser could achieve91%on open English Treebank. However, many of these state-of-the-art parsers have a drawback that they parse sentence slowly. As we know, slow parsers could not be practical. In this paper, we study fast parsing technique that is chunking based parsing. This parsing model has great advantage in parsing speed. In addition, we also study some methods which aim to improve this parser.The following is the content of this dissertation:1. Study and build a chunking based parser. Chunking based parsing adopt chunking techniques into phrases based full parsing, it’s composed of two sub-models:based-level chunking and up-level chunking. Base-level chunking takes the words and their Part-of-Speech as its input and gives out base level chunks. Up-level chunking takes the result of base-level chunking as input and merges these chunks into sub-trees, and then iterates chunking and merging process until it get one full parse tree.2. Research on improving Part-of-Speech tagging. Part-of-Speech tagging is also basic task in NLP field. It aims to assign each input word a correct Part-of-Speech tag. As the pre-step of syntactic parsing, Part-of-Speech tagging has direct impact on syntactic parsing. So, one direct method to improve syntactic parsing is to first improve Part-of-Speech tagging. In this paper, we first study the syntactic feature’s impact on Part-of-Speech tagging, and then used the transformed data from other human annotated corpus as additional training data to improve our Part-of-Speech tagging system. And the experiments show that the proposed two methods all improve our Part-of-Speech tagging system in different degree. 3. Research on improving chunking based full parsing. We proposed three methods to improve chunking based parsing. The first one is using richer features, the second one adopts semi-supervised method, and the third method is to generate n-best parses and search for the best parse. Our experiments show that the latter two methods could significantly improve the performance of our parser.The contribution of our work includes the following:we build a fast chunking based parsing. We also firstly apply this parsing model on Chinese and achieve state-of-the-art performance. In addition, we proposed two methods to improve Part-of-Speech tagging and three methods to improve the performance of our parser. The experiments show that most of our proposed methods are very effective, and our work lays a good foundation for the practical usage of our parsing models.
Keywords/Search Tags:natural language processing, chunking, parsing, part-of-speech tagging, condition random field
PDF Full Text Request
Related items