Font Size: a A A

A Random Conditional Fields Based Method To Chinese Word Sense Disambiguation Research

Posted on:2008-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:X L MiaoFull Text:PDF
GTID:2178360212483667Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Word Sense Disambiguation (WSD) is always regarded as a focus and difficult problem in natural language processing, which has theoretical and practical significance in the fields of machine translation, information retrieval, syntax parse, document classification, etc.In this thesis, the research background of word sense disambiguation in domestic and abroad is investigated, and current word sense disambiguation algorithm and relative technology is analyzed.On the stage, unsupervised WSD methods can save much tagging work by hand, but the relatively low accuracy is insufficient for the practical needs. Supervised WSD methods are still the main methods in practical applications.This thesis proposed and constructed a HowNet based Chinese Sense Instance Corpus (CSIC) to serve as a knowledge source in WSD task. Now, the building of the CSIC is in progress. Some important work has already been done, such as the designation of main frame, the establishment of tag specification, the development of tag platform, the regularization of corpus and the development of evaluate module. All of these works lay a good foundation for the further word sense tagging work in large scale.A tag platform was developed to increase the speed and quality in construction of the CSIC. By using this platform, the hand tagging efficiency can be improved. It can also maintain the consistency and check errors in the corpus. At the same time, we can conveniently conduct some different WSD experiments on this platform, and it provides an evaluating module.Conditional Random Fields (CRFs), a recently introduced conditioned probabilistic model for labeling and segmenting sequential data, is a statistics-based machine learning model. Now, in natural language processing domain, CRFs model usually be used in word segment, POS tagging and shallow semantic parsing. This thesis tries to introduce CRFs to WSD task. A CRFs based experiments was designed to learn WSD knowledge from the CSIC automatically.The experiment result shows that CRFs makes good performance in open testingenvironment.
Keywords/Search Tags:Word Sense Disambiguation, WSD, Chinese Sense Instance Corpus, CSIC, HowNet, Conditional Random Fields
PDF Full Text Request
Related items