Font Size: a A A

Design And Implements Of WSD System Based On Chinese Real Text

Posted on:2004-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2168360095453777Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Word Sense Disambiguation (WSD) plays an important role in Natural Language Processing (NLP). The study on WSD has great theoretical practical significance in Natural Language Understanding (NLU) and now it has become a hotspot and nodus. As an intermediate task, WSD is not an end but a prerequisite for other NLP task and it is helpful for machine translation (MT), information retrieval (IR) and syntax analysis etc. The main work in the dissertation is to study how to acquire the knowledge that is supporting WSD from different language resources and build a WSD system about Chinese real text .The research work of this dissertation is organized as follows.The process of WSD is a process of acquiring WSD knowledge in substance. Making full use of the existing dictionary resources can avoid the work of manually tagging word sense. The dissertation extracts the knowledge that can be used for WSD from HOWNET and builds some knowledge databases such as collocations, list of dynamic role, relations between substance and attribution and sememes-relations which are based on the analysis of the relation among the sememes which define the concepts.A good WSD system is a combination of different knowledge. We build a WSD model considering a diversity of information, including syntactic tags, word frequencies, collocations, semantic context, role-related expectations and selected references. There are a POS filtering, a partial filtering, and a collocation base. On the base of the model, we design and implement the system.Evaluation of WSD system is an important problem. In the dissertation, we use news corpora from China daily including politics, sports, agriculture, science. The corpora are POS tagged using the system of Shanxi University. The experiment results prove that the model is effective in tasks of WSD and the precision can reach about 80%.The dissertation attempts to disambiguate all the content words in the text and build a WSD system, which combines different language knowledge. The result is better man that the traditional system.
Keywords/Search Tags:NLP(Natural Language Processing), WSD (Word Sense Disambiguation), ambiguous words, real text, HowNet, relations
PDF Full Text Request
Related items