Font Size: a A A

Voice Conversion Based On AHOcoder And GMM Model

Posted on:2019-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:J AnFull Text:PDF
GTID:2428330566988663Subject:Engineering
Abstract/Summary:PDF Full Text Request
Voice conversion technology is aimed to transform two different speakers' voice by personality traits,by change the source speaker's personality parameters,so that it becomes the target speaker's personality information,and maintain the voice of the information unchanged.Voice conversion technology can provide personalized voice for text-to-speech terminals,supplement medical sounds for patients,and enrich intelligent human-machine interaction.In addition,the research of voice conversion technology helps to promote the continuous development of other areas of voice signal processing,such as improving the quality of speech synthesis,reducing the difficulty of speaker identification,etc.Therefore,voice conversion has far-reaching application prospects and greater theoretical research value.This paper studies the voice conversion based on AHOcoder model and GMM model.The main work is as follows:First of all,starting from the principle of speech generation,described the mathematical model of speech system and the commonly used speech characteristic parameters.Analyzed the influence of feature parameters on speech generation and introduced the speech conversion model briefly.AHOcoder for voice decomposition,feature parameter extraction and synthesis is proposed,which decomposes the speech signal and extracts the logarithmic fundamental frequency,the Mel-Frequency Cepstral Coefficients and the maximum voice frequency,After conversion,the characteristic parameters are synthesized.Secondly,this paper focuses on the voice conversion system based on AHOcoder model and GMM.In order to improve the problem of reduced voice quality caused by GMM,bilinear frequency bending training was added to improve the quality of converted voice.During the experiment,it is found that different durations have a certain impact on the speech conversion effect.Therefore,the mean-variance method is used to find the adjustment factor for the duration of the source speech and the target speech,and the time duration of the source speech is adjusted in conjunction with the time-domain superposition method.The adjusted voice conversion is more close to the target voice.Finally,for the smoothing of GMM,spectrum envelope compensation is further carried out,and the global variance is added based on GMM and bilinear frequency bending training.Simulation experiments show that the spectral envelope compensation method reduces the MFCC spectral distance between the converted speech and the target speech,improves the effect of speech conversion,and further improves the quality of converted speech.
Keywords/Search Tags:voice conversion, speech decomposition synthesis, GMM, time adjustment, global variance
PDF Full Text Request
Related items