Font Size: a A A

Based On Non-parallel Corpus English Pronunuciation Dictionary Construction Method

Posted on:2022-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WangFull Text:PDF
GTID:2505306509454754Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the in-depth study of English speech recognition and speech synthesis technology,English pronunciation dictionary,as an important link between acoustic model and speech model,has made great progress in its construction method.However,the existing English pronunciation dictionary construction methods still have a series of problems,such as relying on parallel corpus of speech and text,difficult data collection,high cost of expert tagging and so on.Therefore,it is necessary to study more effective methods to improve the efficiency of pronunciation dictionary construction.In order to solve the above problems,this thesis studies the construction method of English pronunciation dictionary based on non-parallel corpus.Firstly,this thesis studies the traditional Grapheme to Phoneme(G2P)model based on sequence to sequence,and adopts the Encoder-Decoder + LSTM deep neural network model structure based on long-short-term memory network to replace the original G2 P model.At the same time,the thesis proposes the Phoneme to Grapheme(P2G)model and G2 P model as the contrast model with simultaneous training.The P2 G model structure adopts the Encoder-Decoder + Attention deep neural network model based on attention mechanism.Compared with the traditional sequence to sequence method,the word error rate(WER)and phoneme error rate(PER)of the hybrid model method are reduced by 7.1% and 2.3%,respectively.Secondly,there are some problems in pronunciation dictionary construction methods,such as low efficiency of model training,small scale and high labor cost of expert correction.This thesis proposes an active learning module to solve these problems in the post-processing steps after model training.By calculating the output two-tuple matching degree of G2 P model and P2 G model,and then extracting discriminative samples with the low two-tuple matching degree,we can obtain the most representative samples for expert modification.The operation will minimize the manual participation.
Keywords/Search Tags:Pronunciation dictionary, Grapheme to Phoneme, Phoneme to Grapheme, Sequence to Sequence
PDF Full Text Request
Related items