Font Size: a A A

Active Learning For Chinese Dependency Treebank Building

Posted on:2012-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2218330362450442Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Dependency Parsing is at the core of Natural Language Processing. The parsing results can be directly used on Search Engine for analyzing users'queries and recognizing the important terms. If can solve the problem effectively, on the one hand, the correctness and effectiveness of the dependency philosophy could be checked, on the other hand, it will offer help to up-layer applications such like Information Extraction, Question & Answering and Machine Translation.Dependency Parsing mainly adopt supervised machine learning at present. In this method, it's necessary to have a large annotated treebank to build a statistical dependency parser. But acquisition of such a treebank is time consuming, tedious and expensive.This thesis presents two methods to reduce the annotation effort by using active learning when building the treebank. Method 1: Clustering. Use cluster alogrithm to drop the redundancy instances. Method 2: Low Confidence First. Select the most uncertain samples to annotate, instead of annotating blindly the whole training corpus. In this thesis, two confidence measures are used, they're Uncertainty-based Sampling and Query-by-committee.The experiments show that Method 2 is more useful and effective than Method 1. On the one hand, the parsing accuracy rises about 0.8 percent by active learning when using the same amount of training samples. On the other hand, for about the same parsing accuracy, it only needs to annotate 70% of the samples as compared to the usual random selection method. Method 2 can be also used in Domain Adoption on Dependency Parsing.Due to low effectiency, Dependency Parsing is mainly used on research in academia at the moment. To overcome this drawback, this thesis implements the Paralleled Graph-based Dependency Parser, which enhances the parsing speed from 5.5 sentences to 66.7 sentences per second. And the peak value reach 100 sentences perl second, which enhances the pragmatic of Dependency Parsing in the real world.
Keywords/Search Tags:Dependency Parsing, Dependency Treebank, Active Learning, Confidence, Query-by-committee
PDF Full Text Request
Related items