Font Size: a A A

Semi-supervised Learning On Chinese Dependency Parsing

Posted on:2013-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z J WuFull Text:PDF
GTID:2268330392969054Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Dependency is a relationship between two phrases who dominate the other in asentence. A better way to represent dependency relationship in computer is calleddirected dependency graph. Dependency parsing becomes an important part of syntacticparsing due to its intuitive, easy to understand and simple structure. Syntactic parsinghas two goals. The one is to determine the structure of a sentence and the other is to findthe relationship of each component. The main purpose of dependency parsing is toidentify the syntactic structure of a sentence through analyzing the dependencyrelationship between phrases.With the rapid development of computer science and technology, it is easy tocollect large scale of corpus. Some languages such as English have established a largesize corpus. So it is possible that processing those huge corpus in statistical methods.But it takes so much time and financial resources to annotate the Part-of-Speech tag anddependency relationship for that corpus. The existing Chinese dependency corpus isvery small. And lacking uniform labeling specification leads some significantdifferences between them.This paper achieves a better result by desiging a tri-training based algorithm toestablish a semi-supervised learning method. The experiment combines theinadequacies Chinese dependency corpus with a large number of unlabeled corpus.The experimental data is taken from CoNLL-2009Chinese evaluation corpuswhich contains a total of22,276sentences. We use three types of classifier to implementthe improved tri-training algorithm. The three classifiers are trained from two differentdependency parsers which are MSTParser and MaltParser. The original tri-trainingalgorithm is too cumbersome and its iteration process is very time-consuming. Thispaper modifies the original tri-training algorithm to reduce the training time. The paperalso combines the feature vector of word’s form and lemma and adds some third-orderfeature vectors while training on MSTParser and MaltParser. The experimental resultsshow that the using a large number of unlabeled corpuses improves the performance ofclassifiers obviously.
Keywords/Search Tags:dependency parsing, dependency tree, corpus, semi-supervised learning, classifier
PDF Full Text Request
Related items