Font Size: a A A

An Algorithm For Voice Conversion With Limited Speech Corpus

Posted on:2019-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:D GuFull Text:PDF
GTID:2428330572492960Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Voice signal contains a variety of information,such as the speaker's identity information,emotional state,and voice content.Voice conversion is a technique that uses the identity information of the target speaker to replace the source speaker's identity information without changing the language content.Voice conversion technology has broad application prospect in the fields of spoofing/anti-spoofing,artificial intelligence,restoring damaged speech,and speech interest interaction.However,the problems like a large number of sources and target speakers corpus are needed before the conversion and poor voice quality after conversion restrict the application of the voice conversion.Under the condition of limited target speaker's corpus,this dissertation proposed a voice conversion algorithm with limited corpus using unified tensor dictionary.Firstly,parallel speech of N speakers was selected randomly from the speech corpus to build the base of tensor dictionary.And then,after the operation of multi-series dynamic time warping for those chosen speech,N two-dimension basic dictionaries can be generated which constituted the unified tensor dictionary.During the conversion stage,the two dictionaries of source and target speaker were been established by linear combination of the N basic dictionaries using the two speakers' speech.The experimental results showed that when the number of the basic speaker was 14,our algorithm can obtain the compared performance of the traditional NMF-based method with few target speaker corpus,which greatly facilitate the application of voice conversion system.To deal with the problem of the low-quality voice caused by the ‘detail loss' in the sparse representation algorithm,this dissertation proposes a voice conversion algorithm based on the harmonic impulse separation.The algorithm is an improvement of the unified tensor dictionary(UTD)algorithm,and adds a preprocessing procedure of harmonic impulse separation.The harmonic and impulse signals are transformed by their respective conversion systems respectively,and the final conversion speech is added after the transformation.To settle the preprocessing separation,this algorithm trains the harmonic dictionary and the impulse dictionary during the training period.Due to the fact that conversion system adopts the voice spectrum as the conversion parameter,based on this,two improvement measures are proposed by this dissertation: spectrum compression and residual compensation.Experiment results show that this algorithm can effectively improve the voice quality of voice conversion algorithm,and can obtain high-quality voice conversion under the condition of few corpuses.Besides,the quality of the voice conversion by the proposed algorithm is higher than that of the Non-negative Matrix Factorization algorithm.Experiment results also show that the residual compensation can better improve the objective evaluation indicator of the conversion system,while the spectral compression plays a more important role in the subjective evaluation of the conversion performance.
Keywords/Search Tags:Voice conversion, Limited corpus, Multi-DTW, Tensor dictionary, Harmonic Percussive Separation
PDF Full Text Request
Related items