Font Size: a A A

Chinese Real Text Semantic Role Labeling

Posted on:2008-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:L J ChenFull Text:PDF
GTID:2205360215454451Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Semantic Role Labeling is one formulation of machine traslation and interpreting text. It need to identify the semantic arguments filling the roles of the sentence predicates antimatically. Though the SRL systems in English have used many statistical methods, there is still a lack of studies in Chinese. The SRL in Chinese has its own characteristics, so we shouldn't copy the labeling norm and method of English.This dissertation first try to label semantic roles in corpus of the true text in Chinese using Tsinghua Chinese Treebank. This is a new attempt of SRL, and have a lot of difficulties which the systems before don't have. (1) In Propbank, they often select sentences according to the frameset of verbs. It is favourable to SRL, but it can't give a true picture of the true text. The sentences in TCT stick closely to the original instead of being selected. Having a lot of clauses is a distinctive characteristic of Chinese. The true text can reflect it, and make the labeling more difficult. (2) We use small-scale training data and test data ,which also add difficulties to labeling. To confirm the validity of the system, we use 5-fold cross-validation so as to make up for the lack of training data. (3) Defferent from the system before, a lot of verb types that occur in the test data are absent from the training data in our system. So the framesets of verbs which are also important for labeling are very little. (4) We need to label the nominalizations,which are not considered in other systems. (5) There is no empty categories in TCT,so we need to decide which constituent corresponds with them.For the difficulties above, we use a new method and make a system which is suitable to label semantic role in small-scale corpus of the true text:(1) In this system we combine the statistical method with the rule method. By selecting the weak link of the rule ,we choose the subsets of the taining data and testing data, and use decision tree to improve the precision of SRL. We first use the rule attributes, in company with other attributes to label semantic roles by decision tree. Our results are better than that using probabilistic attributes.It show that rule attributes can improve the flexibility of the system. (2) We label them in four steps: identify roles by rules;identify roles by decision tree;classify roles by the rules;classify roles by decision tree. (3) We clasify the predicate-argument relations by making best of the one-to-one correspondence between syntax and semantic, and every class has its own strategy to label the roles. (4) The syntactic transformations and changes in normal word order have great effect on SRL. We discuss the type of them, and make rules to identify the core arguments. (5)The resouces for SRL is very little, so we use the the lexicon of thesaurus to classify verbs and nouns, and make verb lexicons , noun lexicons for classifying the roles. A preposition lexicons is built to identify the roles marked by the prepositions.This method reflects the characteristics of SRL, and get good result in semantic role labeling . Though we use small-scale corpus of the true text, the F-score of the system is 82.0%, which is comparable to other methods.
Keywords/Search Tags:Nature Languge Processing, Semantic Role Labeling, Syntax Structure, Rule Method, Decision Tree
PDF Full Text Request
Related items