Chinese Lexical Analysis Method Based On Morpheme Studies

Posted on:2012-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:Q Wang

Full Text:PDF

GTID:2218330368994009

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As the key problem of Chinese information processing, Chinese lexical analysis mainly includes three tasks:word segmentation,part of speech tagging and meaning disambiguation. Although in recent years the Chinese lexical analysis has made great progress, it is facing with huge challenges when dealing with large-scale open text, especially with the problem of unknown words.Therefore, how to excavate and shows the morphological features effectively, in addition, how to solve the problem of unknown words identification and prediction are the major difficulties which Chinese lexical analysis should face, also, it is a research focus.Based on large-scale training corpora, under the machine learning framework, this paper does a research in Chinese morphology analysis methods and focus on the problem of Chinese unknown words recognition and meaning prediction. Specifically, this paper does the research following three aspects:Firstly, taking morpheme as the basic tokens of word-formation, the paper studies the influnces of different tagging sets and different window sizes on Chinese word segmentation. The research takes advantage of Conditional Random Fields model. The experimental results from the data of SIGHAN Bakeoff 2005 show that the introduction of morpheme is good for improving the unknown words recognition ability.Secondly, in view of the unknown words prediction problem in Chinese part of speech tagging, taking morpheme as the base, this paper realized a Chinese part of speech tagging system which is based on maximum entropy model. It discovers and combines the lexical features of words internals to create the system. The experiment based on SIGHAN Bakeoff 2007 speech proves that morpheme-based Chinese part of speech tagging method has the large advantages to predict the POS of unknown words.Last but not least, this paper puts forward a Nave-Bayes model based on central morphemes. At the same time, it studies in meaning prediction problem for Chinese unknown words under the affection of this model. Experiment shows that the Nave-Bayes model based on central morphemes could, to some degree, soleve the meaning prediction problem of Chinese unknown words.

Keywords/Search Tags:

Chinese lexical analysis, word segmentation, part-of-speech tagging, sense tagging, unknown word

PDF Full Text Request

Related items

1	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
2	Research And Implementation Of Chinese Lexical Analysis Technology
3	Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach
4	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
5	Chinese Word Found Its Part Of Speech Tagging
6	Word Segmentation And Pos Tagging In Chinese
7	Research On Lexical Analysis Based On Neural Networks
8	The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging
9	BiLSTM And CNN Based Joint Model For Chinese Word Segmentation And Part-of-speech Tagging
10	Research On The Methods Of Automatic Correction Of Chinese Word Segmentation And Part-of-Speech Tagging