Font Size: a A A

The Frame Disambiguation Of Automatic Identification Of Chinese Frame

Posted on:2012-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y H GaoFull Text:PDF
GTID:2218330368989680Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Currently, the goal of Natural language understanding is to achieve the communication between people and computers. However, polysemy is widespread in natural language. And the bottle-neck of the current natural language is the task of disambiguation. In particular, the first thing is to make words in the sentences clearly. The automatic construction of large scale semantic corpus and the improvement of natural language processing technologies, e.g. Chinese information retrieval task, Chinese question answering system, Chinese information extraction technique and etc., will be achieved along with the further research of automatic semantic role labeling based on Chinese FrameNet. In this thesis, the task of frame disambiguation is defined as:To assign an appropriate frame that is presented in current Chinese FrameNet (CFN) to the given target word within the sentence. The method give a new idea for automated building large-scale semantic corpus, but also to insure automatic semantic parsing. it will effectively promote information retrieval, question answering, text classification, machine translation and development of natural language understanding technology.Currently, owing to Chinese FrameNet is still need to expend. To meet the need of semantic analysis, the paper cut the task of Frame disambiguation into 3-sub tasks:Lexical Unit Identification Chinese Unknown Frame Detection and Chinese Frame Disambiguation. This issue is focus on the task of Chinese Frame Disambiguation. In this issue, the task is modeled as a problem of frame classification based on the context, using a maximum entropy model. We select 2077 annotated sentences that are from 88 lexical units given by Chinese FrameNet, and split them into training set and testing set according to 3-fold cross-validation.The selected features in this issue include BOW, current words, part of speech, basic chuck, and labels in dependency syntax tree, and the technique of optional sizes of slide window is also used in this paper. On the test set, the word-based model achieves 62.50% Accuracy, which is the baseline result we get. Furthermore, we Selected BOW feature based on baseline, the model achieves 68.37% Accuracy, which is 3.95% higher. The experimental results demos that the main effect of the model performance is BOW feature.Furthermore, In the base-chunk-based labeling model, we get the information from the base-chunk parser which is made by Tsinghua University, the model achieves 64.42% Accuracy. The experimental results demos that the base-chunk feature have less effects on the model performance.In addition, based on three dependency parsers:Stanford, HIT and Mate, the dependency syntax tree-based model achieves 66.44% Accuracy. The experimental results demos that:the feature of dependency syntax parser have some effect on model performance, however, did not improve the model performance significant, mainly due to, in an open environment test corpus, the performance of auto-parser is not ideal.Finally, based on the word features, the optimal models which integration the features like BOW and Mate dependency syntax information achieves the best results,69.28% Accuracy.
Keywords/Search Tags:Chinese FrameNet, Frame Disambiguation, Maximum Entropy Model, Chinese Base Chunk
PDF Full Text Request
Related items