Font Size: a A A

Research On Construction Technology Of Chinese Verb-Object Collocation Bank

Posted on:2012-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z C LiuFull Text:PDF
GTID:2248330371958247Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The collocation is a word combination having certain grammar and semantic structure relations, which plays an important role in syntax parsing, machine translation, etc. However, only performing syntax parsing on natural language can not satisfy the requirements of semantic retrieval and other deep natural language processing tasks, it’s also necessary to perform semantic analysis. This paper combines collocations with semantic knowledge, and use the computer technology to extract collocations from larges-scale real corpus to build a bank with semantic relation tag, then more important knowledge resources can be provided for natural language processing.This paper will pay attention to Chinese verb-object collocation, mainly focused on three aspects, namely identification of verb-object collocation, automatic labeling of semantic relation, and building of verb-object collocation bank.Firstly, this paper proposes the identification of verb-object collocation based on a new cascade algorithm of Conditional Random Fields, and combines with a new sequence labeling form "ONIY". By comparing the test results of two part-of-speech tag sets, the best F-Score based on Tsinghua University Treebank is 90.65%, and the best F-Score based on Peking University standard is 82.00%. The experiments show that the proposed algorithm can effectively improve the identification accuracy of collocation, and play a positive role on multi-nested type collocation.Secondly, it establishes 20 semantic relation frames, and converts the automatic labeling problem of the verb-object relation into the sequence labeling based on the Conditional Random Fields. The sequence tag set adopts“OBIE | x”, and uses words, part-of-speech, target words, distance between target words and collocation words, senses and combinations of words relative to HowNet, etc as the features. Then, the optimum feature template is chosen from the orthogonal experiment strategy. The experiment shows that the effect is good through performing the open test on the 20 frames.Finally, it extracts the common verb table from HowNet, People’s Daily and South Weekends, and builds a primary collocation bank with semantic relation. Then, the collocation bank is used as the basic resource for natural language processing, and can be applied to the fields of machine translation, information retrieval, etc.
Keywords/Search Tags:Verb–Object Collocation, Semantic Relation, New Cascade Algorithm, Conditional Random Fields, HowNet
PDF Full Text Request
Related items