Font Size: a A A

Studies On Key Techniques For Voice Conversion

Posted on:2006-08-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiFull Text:PDF
GTID:1118360155972180Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Voice conversion is a technique that modifies a source speaker's speech to be perceived as if a target speaker had spoken it. As a recent branch of speech signal processing, voice conversion has important applications both at theoretic and practical practices. If applied for a text-to-speech system, it can assist to create the voices of various characters of speakers easily. Other applications can be found in the fields of movie dubbing, very low rate bit speech coding, encrypting the speaker's personality in speech communication, simulating another speaker and etc. The study of voice conversion has important influence on the areas voice analysis, speech coding, speech synthesis, speech enhancement, speech recognition, speaker recognition and so on. In this thesis, the study of voice conversion system for small database for training purpose has been presented based on the two key technologies of voice conversion: pitch-scale modification and spectral envelope conversion. Detailed discussions are presented in the following 5 categories.First of all, the various methods of pitch-scale modification have been studied. In the investigation of wide-band TD-PSOLA, it is found that the length of analytic window has effect on the results, i.e. if the length is selected as twice of the smaller pitch of source and target, the result is better than that with twice of the bigger pitch of source and target. Further investigation has been carried out to explain the reasons. It is discovered that when the pitch is transformed using the method of compression-expansion of FD-PSOLA, the phase can be compressed or expanded, resulting in the variation of the phase changing ratio and consequently the voice distortion after base pitch period transformation. To solve this problem, the method of linear simulation of phase as called piece-wise linear phase has been proposed in this research. Based on this simulation, the phase ratio can be regarded proximate constant after the transformation using compression or expanding methods. Moreover, the center of analytic frame keeps steady same as the center of the original method for each wave. The wave shape after modification is closer to the source speech than the orginal model, therefore the speech can be modified pitch-scalely successfully. It is revealed that the FD-PSOLA transformation can not only bring the variation of phase ratio but the compression or expansion phenomenon of the harmonic, losing the quality of voice signal. Therefore, a model of speech analytic-synthetic speech is proposed to get rid of this problem, which is called pseudo harmonic speech model. Based on this model, when the pitch is modified, the pitch harmonic can be maintained without compression or expanded effects. With the pseudo harmonic speech model, the pitch-scale and time-scale modification of speech can be achieved with high quality.Secondly, different methods for representing spectral envelope have been studied. Traditionally, the LPC coefficients can be obtained from calculating the autocorrelation equation of time domain speech signal. When the order of LPC coefficients is low, the corresponding LPC envelope is smooth but not precise. When the order is high, the LPC envelope of female speech is usually not smooth and easily affected by the harmonics. However, LPC coefficients obtained from spectral envelope can overcome these defects. The transformation between LPCcoefficients and LSF coefficients have been explored with several methods of transformation from LPC coefficients to LSF coefficients obtained. The cepstral envelope obtained from spectral envelope is more precise than that from spectrum. Taking this advantage, the MFCC coefficients are calculated based on Mel-scale staircase envelope. This method is easy and steady. The MFCC-linear spectral envelope can represent the spectral envelope at low frequency precisely.Thirdly, bilinear transform function has been applied to the spectral envelope conversion with merits discussed, such as less transformation parameters, steady and suitable for small vocabulary training practice of voice conversion system. The research has been carried out on the following aspects: (1) the study of spectral envelope converted based on the system unit impulse function, (2) the proposals of LPC coefficients of modified spectrum obtained by two methods and by which the speech spectrum can be modified, (3) contribution of envelope modified methods based on LPCC envelope and cepstral envelope with three algorithms to calculate the spectral modified LPCC envelope and cepstral envelope, (4) the analysis of the method of calculating the conversion coefficient of bilinear transform function with 3 algorithms using DCT methods and (5) the method through training for voice conversion.Fourthly, as being important personal feature in speech, the study of the spectrum tilt has been involved in this project. Two functions of critical dumping filters are used to compensate the spectrum tilt. The spectrum tilt compensating coefficient can be obtained from training. The results illuminate that spectrum tilt compensation can make up the defect of bilinear transform function with more accuracy that can not be dealed with the spectrum amplitude.Finally, the investigation voice conversion system for small database for training purpose has been presented. A voice conversion system using only vowel character /a/ for training has been designed composing various methods. The experiments show that the system can convert the speaker personality successfully with high quality.
Keywords/Search Tags:voice conversion, pitch-scale modification, spectrum tilt, spectral envelope conversion, MFCC
PDF Full Text Request
Related items