Font Size: a A A

Automatic Extraction Approach Research On Chinese Verb Lexical Collocations

Posted on:2007-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:J L YangFull Text:PDF
GTID:2178360185450912Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In generally, parsing is a significant and difficult problem in NLP, and it is regarded as a crucial step to achieve language understanding. The parsing, orienting to dependency grammar, retrieves head verbs and its collocations and analyses dependence between them in the sentence to build denpendency grammar tree. In dependency grammar, verb is considered as the center of sentence and it does control effect. So the study about verb combination frame can offer a better foundation for Chinese parse and processing.In this paper, Corpus_based automatic extraction approach of Chinese verb lexica collocations are mainly researched and experimented.(1) Because tagging of head verbs and its collocations play an important role in the collocation extraction, multi-category words tagging in part of speech are firstly took into account as foundational work. A method for automatically acquiring correcting rules of POS tagging of multi-category words is presented based on incomplete decision tables, and Rough Sets reduction theory based on attribute significance is also used for automatically acquiring correcting rules. As an assistant tool, the method can be used to correct the tagging results generated by POS tagging software to promote the precision of POS tagging, and then it is help for getting high quality corpora.(2) Ensuring high quality corpora, automatic extraction approaches of Chinese verb collocations, which orenting to dependency grammar, are researched. Considering the significant and effective existing methods, four kinds of prior word relativity measurements (mutual information, Cosine coefficient, chi-square test, likelihood ratio) and three kinds of word structure distribution measurements (deviation, spread, entropy) are induced and concluded, and then the comparation and analysis based on the above methods are given respectively. Furthermore, a hybrid method based onmutual information and entropy for collocation is proposed. When the new method is used to extract verb-noun, verb-verb collocations from corpora, it achieves a success in high frequency.(3) For the first time, Maximum entropy model is introduced to extract Chinese verb collocations, especially to extract Chinese verb-verb collocations. In the model, the part of speech in context and the strength of association between head verbs and its collocations are selected to construct candidate composite feature templates. Using reduct technology in Rough Sets theory, simple composite feature templates are acquired and the maximum entropy models are trained based on them. Though a series of experiment, the results indicate that the model for Chinese verb-verb collocations is feasible.Finally, the researches on Chinese verb lexical collocations in future are prospected.
Keywords/Search Tags:collocation, rough set, entropy, maximum entropy model
PDF Full Text Request
Related items