Font Size: a A A

Research On Speech Recognition Using Voice Conversion Approach

Posted on:2006-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:F HuangFull Text:PDF
GTID:2168360155974037Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Voice conversion is an newly branch of speech signal processing. It aims to modify a source speaker's speech signal to sound as if it is uttered by another speaker while keeping the language information unchanged. Voice conversion technique has great implus in the research of speech analysis, speech coding, text-to-speech synthesis, speaker identification and speech coding, etc.It is well known that the Sinusoidal+Noise(S+N) acoustic presentation model has remarkable advantages in parameter modification and owns high quality synthesis results. Characteristics of the analysis speech can be easily transfered by rescaleling the parameter sets. Thereby, this dissertation use the S+N model as a tool for the research of voice conversion. Via statistic method, we extract speaker identifications from the parameter sets of S+N representation, and propose a new voice conversion method in consequence. To apply the new conversion method, we combined it with the study of speech recognition. As one kernel innovation of this paper, we bring in an new concept, named as voice-conversion-based speech recognition , to the state-of-the-art speech systems.The key point of the research of voice conversion is to extract synthesisable speaker identifications from speech signals. With a great amount of foundermental experiments within the framework of S+N modeling, we extract meaningful speaker characteristics called Frequency parameter Probability Distribution(FPD) and Amplitued-weighted Frequency parameter Probability Distribution (awFPD). By GMM representation, we discribe these distributions as two vector sets, Statistical EigenVoice(SEV) and Weighted Statistical EigenVoice(wSEV).Piered by the SEV and wSEV vector sets, we present a new voice conversion method. In the approach, frequency scale and spectrum amplittue scale are converted using SEV and wSEV mapping, respectively. Due to the training expedience, the method is applicable in cross-language voice conversion. Experiment results show that the SEV/wSEV-based approach outperforms traditional LPC method when synthesis quality is concerned.In the field of speech recoginiton, the disadvantage of speaker-independent recognizer is that it takes time to collect a large quantity of training data, which may be impractical for some applications. Though the speaker-dependent recognizer adopts speaker adaption technique to get rid of the disadvantage, it suffers from inefficiency when the amount of adaption parameters is large.To overcome these shortcomings, We introduce voice conversion to the research of speech recognition as a way of speaker adaption. In our voice-conversion based speech recoginitonsystem, speech signals are preprocessed by the voice conversion model before recognized. The preprocess procedure maps speech signals to the training set via means of SEV/wSEV mapping. Whereupon speech signals are adapted as if come from the training set. Furthermore, we address the idea of iterative recognition. The feedback of recognition results can surpervise speaker adaption onwards. Without any parameter modification of the HMMs, error rate is reduced. Within our experiment condictions, the new approach outperforms MLLR adaption by improving 2.5% of the correct rate when 4s of adaption data is available.
Keywords/Search Tags:Voice Conversion, Speaker Characteristics, Speech Recognition, Statistical Eigen Voice
PDF Full Text Request
Related items