Semi-supervised Learning On Chinese Dependency Parsing

Posted on:2013-10-14

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Wu

Full Text:PDF

GTID:2268330392969054

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Dependency is a relationship between two phrases who dominate the other in asentence. A better way to represent dependency relationship in computer is calleddirected dependency graph. Dependency parsing becomes an important part of syntacticparsing due to its intuitive, easy to understand and simple structure. Syntactic parsinghas two goals. The one is to determine the structure of a sentence and the other is to findthe relationship of each component. The main purpose of dependency parsing is toidentify the syntactic structure of a sentence through analyzing the dependencyrelationship between phrases.With the rapid development of computer science and technology, it is easy tocollect large scale of corpus. Some languages such as English have established a largesize corpus. So it is possible that processing those huge corpus in statistical methods.But it takes so much time and financial resources to annotate the Part-of-Speech tag anddependency relationship for that corpus. The existing Chinese dependency corpus isvery small. And lacking uniform labeling specification leads some significantdifferences between them.This paper achieves a better result by desiging a tri-training based algorithm toestablish a semi-supervised learning method. The experiment combines theinadequacies Chinese dependency corpus with a large number of unlabeled corpus.The experimental data is taken from CoNLL-2009Chinese evaluation corpuswhich contains a total of22,276sentences. We use three types of classifier to implementthe improved tri-training algorithm. The three classifiers are trained from two differentdependency parsers which are MSTParser and MaltParser. The original tri-trainingalgorithm is too cumbersome and its iteration process is very time-consuming. Thispaper modifies the original tri-training algorithm to reduce the training time. The paperalso combines the feature vector of word’s form and lemma and adds some third-orderfeature vectors while training on MSTParser and MaltParser. The experimental resultsshow that the using a large number of unlabeled corpuses improves the performance ofclassifiers obviously.

Keywords/Search Tags:

dependency parsing, dependency tree, corpus, semi-supervised learning, classifier

PDF Full Text Request

Related items

1	Research On Technology Of Chinese Dependency Parsing
2	Research On Semi-supervised Domain Adaptation For Chinese Dependency Parsing
3	Multiple Feature-sets Algorithm Of Dependency Parsing
4	Transition-based Dependency Parser Combining With Self-training
5	Research On Treebank Conversion And Application Of Dependency Parsing
6	Research On Mongolian Dependency Parsing Based On The Conversion Of Chinese-Mongolian Dependency Parsing Tree
7	Exploiting Lexical Relations For Semi-supervised Dependency Parsing
8	Research And Implement On Chinese Dependency Parsing
9	Research On Dependency Parsing Of Tibetan Language Based On Deep Learning
10	Word Sense Disambiguation Research Based On Dependency Parsing