Chinese Part-of-Speech Tagging Based On Ameliorated Hidden Makov Model

Posted on:2008-12-25

Degree:Master

Type:Thesis

Country:China

Candidate:M Wang

Full Text:PDF

GTID:2178360242469492

Subject:Computer software and theory

Abstract/Summary:

Chinese Part-of-Speech Tagging is a fundamental problem to many Chinese Information Processing tasks. The task of Part-of-Speech Tagging is to design software that can identify Part-of-Speech in a sentence automatically. One side, the performance of many realistic applications such as information extraction, information retrieval, and machine translation would be improved if the right Part-of-Speech were available. And on the other hand, it is indispensable processing component in Chinese lexical analysis system, Chinese syntax analysis system, and etc. Therefore, its research is of great of theoretical importance as well as practicability.The model of Part-of-Speech Tagging includes both rule and statistics technique. Because of the statistics technique requires no manual rules of natural language and has a high level accuracy, the statistical language model has gradually become a hot research topic. For its better performance, Hidden Makov Model (HMM), one of the statistical models, has been the recent trend in Part-of-Speech Tagging.We propose a method of Chinese Part-of-Speech Tagging based on ameliorated Hidden Makov Model, taking more information of context into the model to describe language phenomena. The result of ameliorated model is satisfying. The main works of this paper includes four parts:1. Although HMM are high performance, the probability of the word depends on its own tag. Contrary to the output dependency assumption of a traditional HMM, we assume that the probability of a word depends not only on its own tag , but also on the next tag while estimating the words' output probability. By doing this we can get more context grammatical information in the HMM.2. Two key factors can be used in evaluating the performance of statistical model of Part-of-Speech Tagging. We introduced several prevalent smoothing algorithms detailed. And a stable exponential smoothing algorithm based on the linear interpolation algorithm is adopted to solve the sparse data problem.3. For the sake of making effective use of parameters trained from ameliorated Hidden Makov Model; we fit the Viterbi algorithm for the new parameter.4. For the imperfection of computable information on each word in corpus, How to solve new words is anther key problem in statistical language model. In this paper, we propose a concreted method in new words.We conducted test on 50, 000 word corpus that was chosen from People Daily. The experimental results showed that the recall score is 96.20%, and precision score is 95.09%. It shows that ameliorated Hidden Makov Model applying to Chinese Part-of-Speech Tagging is effective and feasible.

Keywords/Search Tags:

Chinese Information Processing, Chinese Part-of-Speech Tagging, Hidden Makov Model, Smoothing Algorithm

Related items

1	Statistics-based Chinese Pos Tagging Method
2	Research On The Methods Of Automatic Correction Of Chinese Word Segmentation And Part-of-Speech Tagging
3	HMM-based Chinese Part-of-Speech Tagging And Improvement
4	Research On Improvements Of Chinese Part-of-Speech Tagging System Based On Statistical Model
5	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
6	Research On Kirghiz Basic Part-of-Speech Tagging Based On HMM
7	Study Of Kazak Part-of-Speech Tagging Based Upon HMM
8	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
9	Research And Implementation Of Chinese Lexical Analysis Technology
10	Research On Improved BP-HMM And Its Application In Chinese POS Tagging