Font Size: a A A

Research On Improved BP-HMM And Its Application In Chinese POS Tagging

Posted on:2015-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:W L DengFull Text:PDF
GTID:2298330467988887Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Speech is the most fundamental attributes of vocabulary, it does not only provide thecorresponding knowledge base on syntax, grammar analysis, but also provide the decisioninformation for the benefit of such as part of speech tagging and other natural language tasks. Partof speech tagging is the part of speech of words tagging process, it is extensive researching as abasictaskin theNaturalLanguageProcessing field.POS tagging results directly affects the subsequent task of the Natural language accuracy,Now the method of the promoting of POS tagging method is used by statistical model and rule-based method. Based on the statistical model for part of speech tagging, which is the mostcommonly used is a hidden Markov model (Hidden Markov Model, HMM). Because of someunique grammatical characteristics of Chinese linguistics, In the process of part of speech taggingbased on HMM often appears many problems of sparse data, including model homonymsambiguity and unlisted words etc.. In the process of continuous researching on HMM, manyscholars have proposed annotation method using neural network, rule base and finite state machinemethod combined with the traditional HMM and evolve into a new method for part of speech, alsopointedoutthenewdirection fortheimprovementofpartof speechtaggingeffect.Based on the study of Chinese part of speech tagging characteristics, in order to improve theaccuracy of annotation, the optimization of Chinese part of speech tagging is improved. First ofall, after the study of the statistical model and the traditional neural network for the Chinese part ofspeech tagging process, based on the analysis the field characteristics of the BP network and thetraditional Hidden Markov Model in the part of speech tagging, constitute a BP-HMM with theiradvantages. The new model can better integrate the context information, so as to improve theaccuracy of part of speech tagging. Secondly, because of the traditional smoothing algorithm cannot meet the needs of data smoothing in the new model, according to the characteristic andproperties selection the optimized deleted interpolation algorithm to smooth the state transitionmatrix of the model, and adjust the data sparse problem of the observation probability in order toadapt to the smoothing algorithm to handle new model. Finally, to deal the unknown wordproblem with a new method by using the grammar rules, the new method using the BP networktrainingandestablishing therulebaseto determinetheunknown word probleminthePOS tagging. This paper extract the training corpus from1998"people’s Daily" corpus annotation of thePeking University, based on the open source Natural Language Processing system fudannlp whichis produced by Fudan University, training the BP-HMM in JAVA by Eclipse. After a series ofoperations, using the improved Viterbi algorithm to tag the test corpus annotation. Experimentsshow that, the method based on the improved BP-HMM of POS tagging can obtain more prefecteffectoftagging.
Keywords/Search Tags:Negative feedback network, part o fspeech tagging, Hidden-Markov Model, multi-category word processing
PDF Full Text Request
Related items