Research On Improved BP-HMM And Its Application In Chinese POS Tagging

Posted on:2015-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:W L Deng

Full Text:PDF

GTID:2298330467988887

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Speech is the most fundamental attributes of vocabulary, it does not only provide thecorresponding knowledge base on syntax, grammar analysis, but also provide the decisioninformation for the benefit of such as part of speech tagging and other natural language tasks. Partof speech tagging is the part of speech of words tagging process, it is extensive researching as abasictaskin theNaturalLanguageProcessing field.POS tagging results directly affects the subsequent task of the Natural language accuracy,Now the method of the promoting of POS tagging method is used by statistical model and rule-based method. Based on the statistical model for part of speech tagging, which is the mostcommonly used is a hidden Markov model (Hidden Markov Model, HMM). Because of someunique grammatical characteristics of Chinese linguistics, In the process of part of speech taggingbased on HMM often appears many problems of sparse data, including model homonymsambiguity and unlisted words etc.. In the process of continuous researching on HMM, manyscholars have proposed annotation method using neural network, rule base and finite state machinemethod combined with the traditional HMM and evolve into a new method for part of speech, alsopointedoutthenewdirection fortheimprovementofpartof speechtaggingeffect.Based on the study of Chinese part of speech tagging characteristics, in order to improve theaccuracy of annotation, the optimization of Chinese part of speech tagging is improved. First ofall, after the study of the statistical model and the traditional neural network for the Chinese part ofspeech tagging process, based on the analysis the field characteristics of the BP network and thetraditional Hidden Markov Model in the part of speech tagging, constitute a BP-HMM with theiradvantages. The new model can better integrate the context information, so as to improve theaccuracy of part of speech tagging. Secondly, because of the traditional smoothing algorithm cannot meet the needs of data smoothing in the new model, according to the characteristic andproperties selection the optimized deleted interpolation algorithm to smooth the state transitionmatrix of the model, and adjust the data sparse problem of the observation probability in order toadapt to the smoothing algorithm to handle new model. Finally, to deal the unknown wordproblem with a new method by using the grammar rules, the new method using the BP networktrainingandestablishing therulebaseto determinetheunknown word probleminthePOS tagging. This paper extract the training corpus from1998"people’s Daily" corpus annotation of thePeking University, based on the open source Natural Language Processing system fudannlp whichis produced by Fudan University, training the BP-HMM in JAVA by Eclipse. After a series ofoperations, using the improved Viterbi algorithm to tag the test corpus annotation. Experimentsshow that, the method based on the improved BP-HMM of POS tagging can obtain more prefecteffectoftagging.

Keywords/Search Tags:

Negative feedback network, part o fspeech tagging, Hidden-Markov Model, multi-category word processing

PDF Full Text Request

Related items

1	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
2	Application Of Hidden Markov Model In Part-of-Speech Tagging
3	The Research Of Part-of-speech Tagging Based On Hidden Markov Model
4	Research On Laodian Participle And Part-of-speech Tagging Method
5	Study On Disambiguation Algorithm For Chinese Word Segmentation
6	HMM-based Chinese Part-of-Speech Tagging And Improvement
7	Research And Implementation Of Chinese Lexical Analysis Technology
8	Research On Kirghiz Basic Part-of-Speech Tagging Based On HMM
9	Hidden Markov Model Parameters Estimation For Part-of-Speech Tagging
10	Chinese Word Found Its Part Of Speech Tagging