Font Size: a A A

Research On High Quality Voice Conversion Algorithm Based On Improved GMM And Frequency Warping

Posted on:2018-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:L M CuiFull Text:PDF
GTID:2348330536979838Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speech is a kind of signal which is produced by the speaker.It contains many kinds of natural information,such as semantic information,personal information and emotion,and it is friendly and is easy to collect.Voice conversion technology is to change the source speaker's personality parameters,so that it becomes the target speaker's personality information,and maintain the voice of the voice information unchanged.As the extension and development of speaker recognition and speech synthesis technology in recent years,the researchers at home and abroad pay more attention to the voice conversion technology.With the development of speech conversion technology,the researchers pay more attention to the intelligibility and the similarity of the converted speech,but emphasize the fluency and naturalness.Speech conversion technology not only helps to promote the continuous development of other areas of signal processing speech,but also can be used as the auxiliary medical service,improve the quality of speech;also can enrich the data processing and intelligent human-computer interaction,making human-computer interaction more natural,entertaining and humanized.Therefore,the research on the application of voice conversion has farreaching prospects and great theoretical research value.In this thesis,the technology of voice conversion is studied and the main research contents are as follows:Based on the principle of speech production,this thesis introduces the mathematical model of speech system,the common speech feature parameters and the voice conversion model is introduced briefly.This thesis used for feature extraction and synthesis model is the AHOcoder model.The model can extract 0log f,MFCC(Mel cepstrum)and the maximum voiced frequency.This thesis describes in detail the GMM-bilinear frequency warping and amplitude scaling voice conversion model,and analyzes the training of GMM,the bilinear frequency warping and amplitude scaling training,the conversion process and the related theoretical knowledge.Through the Matlab experiment,the subjective and objective performance of the model is compared with the traditional GMM model and the GMM-bilinear frequency warping model.The results show that the proposed model can convert the speech to the best.This thesis focuses on the research of the improvement of GMM and the frequency warping voice conversion model.Aiming at the problem that the mixed number is fixed in the GMM model and the classification of speech feature parameters is unreasonable,so the iterative self-organizing clustering algorithm is introduced in the clustering process of Gaussian mixture model.This algorithm has good clustering performance,in order to get more in line with individual character characteristic parameters,and improve the quality of speech.Iterative self-organizing clustering algorithm uses square error as clustering criteria,sets the initial value parameter to judge the "merge" and "split" operation,according to the distribution of data,self-adjustment of the optimal number of categories.Compared with the K-Means clustering of traditional GMM clustering,this algorithm has the advantages of self-organization.After iterative self-organizing clustering and then EM iteration,combined with the subsequent bilinear frequency warping,the voice conversion is achieved.Through the experimental analysis,the mel-cepstral distortion value in the objective evaluation is lower than that of GMM and bilinear frequency warping and amplitude scaling voice conversion model.Under the condition of different speech data and different conversion,the average value of mel-cepstral distortion is decreased by 1.49%,which reflects that the distortion of the spectrum of the model is lower,and the similarity between the converted speech and the target speech is better;In the aspects of subjective evaluation,the mean opinion score value is higher than that of GMM and bilinear frequency warping and amplitude scaling voice conversion model,and the mean opinion score value is increased by 5.13%.which shows that the model has better voice quality.Theoretical analysis and experimental results show that compared with the traditional methods,the proposed method has higher spectral similarity and mean opinion score value,which shows that the proposed model has a better performance in both the similarity and the quality of the synthesized speech.
Keywords/Search Tags:AHOcoder, MFCC, Bilinear Frequency Warping Plus Amplitude Scaling Voice Conversion Model, Iterative Self-Organizing Clustering Algorithm
PDF Full Text Request
Related items