Font Size: a A A

Chinese Lexical Analysis Method Based On Morpheme Studies

Posted on:2012-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2218330368994009Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the key problem of Chinese information processing, Chinese lexical analysis mainly includes three tasks:word segmentation,part of speech tagging and meaning disambiguation. Although in recent years the Chinese lexical analysis has made great progress, it is facing with huge challenges when dealing with large-scale open text, especially with the problem of unknown words.Therefore, how to excavate and shows the morphological features effectively, in addition, how to solve the problem of unknown words identification and prediction are the major difficulties which Chinese lexical analysis should face, also, it is a research focus.Based on large-scale training corpora, under the machine learning framework, this paper does a research in Chinese morphology analysis methods and focus on the problem of Chinese unknown words recognition and meaning prediction. Specifically, this paper does the research following three aspects:Firstly, taking morpheme as the basic tokens of word-formation, the paper studies the influnces of different tagging sets and different window sizes on Chinese word segmentation. The research takes advantage of Conditional Random Fields model. The experimental results from the data of SIGHAN Bakeoff 2005 show that the introduction of morpheme is good for improving the unknown words recognition ability.Secondly, in view of the unknown words prediction problem in Chinese part of speech tagging, taking morpheme as the base, this paper realized a Chinese part of speech tagging system which is based on maximum entropy model. It discovers and combines the lexical features of words internals to create the system. The experiment based on SIGHAN Bakeoff 2007 speech proves that morpheme-based Chinese part of speech tagging method has the large advantages to predict the POS of unknown words.Last but not least, this paper puts forward a Nave-Bayes model based on central morphemes. At the same time, it studies in meaning prediction problem for Chinese unknown words under the affection of this model. Experiment shows that the Nave-Bayes model based on central morphemes could, to some degree, soleve the meaning prediction problem of Chinese unknown words.
Keywords/Search Tags:Chinese lexical analysis, word segmentation, part-of-speech tagging, sense tagging, unknown word
PDF Full Text Request
Related items