Font Size: a A A

For Machine Translation Of Spoken Punctuation Filling Technology Research

Posted on:2010-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:X L WuFull Text:PDF
GTID:2208360275998716Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet and the integration of the global economy, thereis a growing emphasis on the research and application of speech machine translation. The distinctive features of spoken language dialogue are that there is no punctuation, thus the sentences has no clear boundaries and the ill-formed language phenomenon. The machine translation's effect of such a dialogue is not ideal at all. So, it needs the pre-processing of spoken language dialogue to improve the quality of machine translation, in which adding punctuation is an important component part.This paper proposes the Adding Punctuation Algorithm based on maximum entropy through the analysis of various statistical language models. The main work follows:1. Construct template mechanism which can automatically extract linguistics features from the corpus, exerting the maximum entropy's advantage of selecting features flexibly. Deeply study a variety of template sets which could solve the problem of adding punctuation, and establish the effective template set through experiments to deal with the addingpunctuation problem.2. Study the IIS parameter estimation algorithm, and on this basis, implement the IIS parameter estimation algorithm with a priori Gaussian smoothing, which effectively avoids the over-fitting phenomenon during the process of machine learning. The algorithm efficiently estimates the value of each linguistics feature which forms the model.3. Study and implement the Single-node Classifying Decoding Method, which predicts the input sequence by using the model and gets the global optimum sequence decoding sequence quickly and efficiently, and then finishes the adding punctuation task. The experiments show that the F-value of the adding punctuation task achieves 87.08% in the open test, which demonstrates the effectiveness of the Adding Punctuation Algorithm.4. Integrate the Adding Punctuation Algorithm into the machine translation system, respectively make the direct translation and the translation after adding punctuation of the test corpus, then respectively evaluate the effect of their translations by using the automatic evaluation tool of machine translation, the result shows that the BLEU value increases from 0.2257 to 0.2465, which demonstrates that the adding punctuation task makes the translation text's quality of the translation machine be more improved.
Keywords/Search Tags:the maximum entropy, adding punctuation, machine translation, IIS algorithm
PDF Full Text Request
Related items