Uyghur Stem Segmentation And POS Tagging Based On Corpora

Posted on:2007-04-22

Degree:Master

Type:Thesis

Country:China

Candidate:P Chen

Full Text:PDF

GTID:2178360185966270

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Constructing high-quality tagged corpora is a fundamental part in the field of Uyghur natural language processing. At present, more corpora of higher quality are required in the fields of machine translation (MT), information retrieval (IR), web text mining, etc. Automatic Stem Segmentation and Part Of Speech (POS) tagging are fundamental to the construction of tagged corpora.This thesis intends to solve the problem of the Stem Segmentation by combining the Bidirectional Matching algorithm and Omni-word Segmentation algorithm. Compared with the Maximum Matching algorithm, this method can improve the precision of the stem segmentation. In this thesis, the improved binary-seek-by-character dictionary query mechanism is employed in the application of Uyghur stem segmentation and it can improve the efficiency.Furthermore, POS tagging methods are explored, and the merits and demerits of both rule-based and statistic-based methods are analyzed. The Uyghur POS tagging is studied by applying the probabilistic method and the unigram Hidden Markov Model (HMM) is adopted. The Relative Frequency Training (RFT) method is used to estimate the model parameters. And the problem of the data sparseness is solved through the backing off data smoothing algorithm. At last, the part of speech is tagged in sentences by utilizing Viterbi algorithm. The unigram HMM based on the probabilistic method and Viterbi algorithm are proved effective in solving the problem of Uyghur POS tagging effectively.

Keywords/Search Tags:

Stem segmentation, Bidirectional Matching algorithm, Omni-word Segmentation algorithm, POS tagging, unigram HMM

PDF Full Text Request

Related items

1	Research Of Chinese Word Segmentation Based On Mechanical Matching And Character Tagging
2	The Research And Implemenation Of The Chinese Word Segmentation System Combining Omini-segmentation With Statistic
3	The Research And Implemenation Of The Chinese Word Segmentation System Combining Omini-Segmentation With Statistic
4	Design And Implementation Of Efficient Chinese Word Segmentation And Postagging System Based On Perceptron Algorithm
5	Research And Implementation Of Chinese Word Segmentation Based On Character Tagging Method
6	The Research Of Chinese Word Segmentation Based On CRF
7	Study On Disambiguation Algorithm For Chinese Word Segmentation
8	Word Segmentation And Pos Tagging In Chinese
9	Research And Implementation Of Chinese Word Segmentation Algorithm
10	Research On Lexical Analysis Based On Neural Networks