Font Size: a A A

Research On Recognition Technology Of Unknown Lexical Units For Chinese FrameNet

Posted on:2014-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:X L ChenFull Text:PDF
GTID:2268330401962544Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Lexical semantics play key roles in most NLP applications, unfortunately, the poor coverage problem is the main limitation of all the semantic resources which contains Chinese FrameNet. The low coverage problem of Chinese FrameNet brings about many unknown lexical units in real texts and restricts the frames semantic analysis tasks seriously for Chinese. In order to identify new lexical units of Chinese FrameNet, this paper do research on the tasks of target word identification and frame selection with the help of word senses of Tongyici Cilin (i.e., a famous large dictionary of Chinese synonyms).The research contents and contributions of this paper are introduced as the following two parts.For the research of target word identification of unknown lexical units, this paper proposes two methods.(1)The first method is the method based on the extended lexical database. First, we get extended lexical database by the mapping between Tongyici Cilin and Chinese FrameNet. Then we use the word sense information of Tingyici Cilin to identify new lexical units of Chinese FrameNet. The results show that the extended lexical database has a higher recall than the original lexical database, and the word sense information makes a higher precision.(2)The other method is the method based on the Maximum Entropy model. The experimental features include words, part of speech and word senses. Results shows that best experimental results of target word identification for totally unknown LUs achieve90.95%in real text.For the research of frame selection of unknown lexical units, this paper also proposes two methods.(1) First is the Average Semantic Similarity method with the idea of the high similarity of the lexical units belonging to the same frame. The accuracy of this method (TOP-4) is78.61%。(2) Second is the Maximum Entropy method whose features selection combining the static features and dynamic features. Experimental results show that the best accuracy of the ME-based method for the same test set is87.29%and for the real news text is75%. The word sense feature is the best static feature and the dependency syntactic feature is the best dynamic feature.These above methods can effectively solve the identification of unknown lexical units and also add the new lexical units into the database and add the new sentences into the sentence database. The main contribution of the paper is that it is the first study of unknown lexical units recognition problem in Chinese FrameNet with the aid of the word sense of Tongyici Cilin. The paper also put forward the semantic level features to solve the unknown lexical units recognition problem, which provides a important feature selection basis for the further study of the identification of new lexical units.
Keywords/Search Tags:Chinese FrameNet, Unknown Lexical Units, Words senseinformation, Target Word Identification, Frame Selection
PDF Full Text Request
Related items