Research On Hidden Markov Model For Chinese Natural Language Processing

Posted on:2004-04-23

Degree:Master

Type:Thesis

Country:China

Candidate:B Chen

Full Text:PDF

GTID:2168360095956762

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Compare with other domains such as programming language, Natural Language Processing(NLP) is more difficult in knowledge acquisition and usage. In the early research of NLP, almost all needed knowledge were collected by linguists, like translation lexicon, all kinds of grammars. But language is a result in society developing, its thesis can not be simply collected by experts. And artificial gathered knowledge may differ in forms, too arbitrary, and cost much. The development of Internet and richness of digital resources make it possible to apply statistical method in NLP knowledge learning. This method needs no prior knowledge, more adaptive and cost less. It is quickly developing in recent years, and has acquired great success in application of speech recognition and OCR, etc. This paper, do research on Chinese statistical language modeling based on Hidden Markov Trigram model. The main contents include gathering of single language corpus, model selection, training, smoothing and compression. An object-oriented Chinese statistical language modeling toolkit is presented. The original trigram model is improved to have more capabilities of long dependency. The contributions of the paper are as fellows: First, according to characters of Chinese natural language, this paper re-estimated technique of corpus collection, model training, smoothing and compression, which are previously applied in western language modeling. The paper analyze their characters and effects on Chinese trigram respectively, and search for optimal technique combination through experiment. Second, after observing dependent phenomenon in modern Chinese, this paper suggests an improved model--LP-Trigram. This model added some type of long dependency into trigram. Meanwhile, in order to adapt structure change of the model, this paper also enlarged Viterbi algorithm, which is originally applied in common HMM Trigram search. The new model added long dependency to trigram, excluded some ambiguities, while keeping original trigram from great change of size and speed. Third, this paper also tests the performance of LP-Trigram by the example of Pinyin to Hanzi conversion system. The experiment demonstrates that LP-Trigram reduces some conversion error of traditional trigram. It make long dependency rightly expressed in HMM Trigram. At last, this paper summarizes its works and points out future works.

Keywords/Search Tags:

Statistical Natural Language Processing, Hidden Markov Model, Long Dependency, LP-Trigram

PDF Full Text Request

Related items

1	Research On Natural Language Programming
2	Application Of Hidden Markov Model In Part-of-Speech Tagging
3	Research On Named Entity Recognition Based On KL-HMM
4	Chinese Word Sense Disambiguation Based On Hidden Markov Model
5	Dependency Parsing Research Model Based On Deep Learning
6	A Study On The Recognition Of Biomedical Named Entity Based On Statistic
7	The Methodology And Implementation Of Chinese Natural Language Query In Databases
8	Research And Application Of Statistical Language Model
9	Software Requirements Verification Based On Natural Language Processing
10	Research On Natural Language Syntactic Parsing Based On Deep Learning