Research On Speech Recognition Using Voice Conversion Approach

Posted on:2006-08-22

Degree:Master

Type:Thesis

Country:China

Candidate:F Huang

Full Text:PDF

GTID:2168360155974037

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Voice conversion is an newly branch of speech signal processing. It aims to modify a source speaker's speech signal to sound as if it is uttered by another speaker while keeping the language information unchanged. Voice conversion technique has great implus in the research of speech analysis, speech coding, text-to-speech synthesis, speaker identification and speech coding, etc.It is well known that the Sinusoidal+Noise(S+N) acoustic presentation model has remarkable advantages in parameter modification and owns high quality synthesis results. Characteristics of the analysis speech can be easily transfered by rescaleling the parameter sets. Thereby, this dissertation use the S+N model as a tool for the research of voice conversion. Via statistic method, we extract speaker identifications from the parameter sets of S+N representation, and propose a new voice conversion method in consequence. To apply the new conversion method, we combined it with the study of speech recognition. As one kernel innovation of this paper, we bring in an new concept, named as voice-conversion-based speech recognition , to the state-of-the-art speech systems.The key point of the research of voice conversion is to extract synthesisable speaker identifications from speech signals. With a great amount of foundermental experiments within the framework of S+N modeling, we extract meaningful speaker characteristics called Frequency parameter Probability Distribution(FPD) and Amplitued-weighted Frequency parameter Probability Distribution (awFPD). By GMM representation, we discribe these distributions as two vector sets, Statistical EigenVoice(SEV) and Weighted Statistical EigenVoice(wSEV).Piered by the SEV and wSEV vector sets, we present a new voice conversion method. In the approach, frequency scale and spectrum amplittue scale are converted using SEV and wSEV mapping, respectively. Due to the training expedience, the method is applicable in cross-language voice conversion. Experiment results show that the SEV/wSEV-based approach outperforms traditional LPC method when synthesis quality is concerned.In the field of speech recoginiton, the disadvantage of speaker-independent recognizer is that it takes time to collect a large quantity of training data, which may be impractical for some applications. Though the speaker-dependent recognizer adopts speaker adaption technique to get rid of the disadvantage, it suffers from inefficiency when the amount of adaption parameters is large.To overcome these shortcomings, We introduce voice conversion to the research of speech recognition as a way of speaker adaption. In our voice-conversion based speech recoginitonsystem, speech signals are preprocessed by the voice conversion model before recognized. The preprocess procedure maps speech signals to the training set via means of SEV/wSEV mapping. Whereupon speech signals are adapted as if come from the training set. Furthermore, we address the idea of iterative recognition. The feedback of recognition results can surpervise speaker adaption onwards. Without any parameter modification of the HMMs, error rate is reduced. Within our experiment condictions, the new approach outperforms MLLR adaption by improving 2.5% of the correct rate when 4s of adaption data is available.

Keywords/Search Tags:

Voice Conversion, Speaker Characteristics, Speech Recognition, Statistical Eigen Voice

PDF Full Text Request

Related items

1	Voice Conversion Using Structured Gaussian Mixture Model In Eigen Space
2	Research Of Denoising And Enhancement In Speaker Voice Recognition
3	The Study Of Speaker Voice Conversion Technology
4	Nonparallel-Corpus-Based Multi Speaker Voice Conversion
5	Research On Any-to-many Voice Conversion Based On Non-parallel Data
6	Voice Conversion Based On Isolated Speaker Model
7	Study Of Speaker-independent Speech Recognition And Robot Voice Control Based On ANN
8	Voice Conversion Based On AHOcoder And GMM Model
9	A Study On Deep Learning-Based Voice Conversion For Identity Disguise In Voice Communication
10	Non-Parallel Many-to-many Voice Conversion Method Based On Adaptive Trans-StarGAN