Font Size: a A A

Research On The Identification Of Verb-Object Collocations For Chinese

Posted on:2009-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:X D JiaFull Text:PDF
GTID:2178360242484716Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Syntactic parsing is an important and difficult task in the natural language processing (NLP). The parsing orienting to dependency grammar, retrieves head verbs and its collocations and analyses dependence between them in the sentence to build a dependency grammar tree. In the language of SVO, verb-object structure is very common, as the core component, holding a dominant position in the sentence, mapping the contours of the entire sentence. If the verb-object component of a sentence could be successfully identified, then on the left of the verb, we may find adverbials and then go nearly obtaining the subject; on the left of objects, we may find the modifiers, and on its right, we will find other components. Hence, identifying the verb-object structure will lay the foundation for parsing a sentence.In this paper, Corpus based identification approach of Chinese verb-object collocations is mainly researched. First of all, on the basis of the study of the statistic method to identify verb-object collocations, for resolving the mistake of identification with the statistic method, by the introduction of linguistic knowledge, this paper presents a method to identify verb-object collocations, based on the semantics, POS constraint combining with the statistic method. Secondly, According to the feature of verb-object collocation structure, the problem of identifying verb-object collocations is transformed into the problem of a sequence tagging, and the identifying verb-object collocations are resolved in the model of Conditional Random Fields (CRF), which is praised as an excellent statistic model in resolving the problem of a sequence tagging.Two methods are used to identify Chinese verb-object collocations from 18-million corpora. The result indicates that the method based on the semantics, POS constraint combining with the statistic method is better than statistical methods in Precision,Recall,F score; the method based on CRF very well integrates the information of the context, with the accuracy, recall, F score reaching 90.78%, 86.18%, 88.42% respectively.
Keywords/Search Tags:Natural Language Processing, Verb-Object Collocation, The Semantics And POS Constraint, Conditional Random Fields
PDF Full Text Request
Related items