Font Size: a A A

An Algorithm For Voice Conversion With Noise Robustness

Posted on:2020-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:S L ZhangFull Text:PDF
GTID:2428330572961515Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Voice conversion is an important branch of speech signal processing.Specifically speaking,Voice conversion is to keep the semantics of speech unchanged and only change the speaker's personality information to make it sound like the voice of a specific speaker.The research of voice conversion includes all aspects of speech signal processing,such as feature extraction,feature alignment,speech synthesis,etc.The research of voice conversion is also conducive to promoting the development of other fields,and plays a very important role in understanding the essence of speech signal.At the same time,the research of voice conversion also has a lot of practice.Application scenarios,such as secure communication,customized personalized voice,etc.In the practical application of voice conversion,the interference of noise to voice conversion is very serious.Aiming at the difficulty of effective voice conversion for noisy voice,this paper proposes a noise robust voice conversion algorithm(BE-NMF)based on the non-negative matrix decomposition algorithm,which optimizes the joint dictionary to make noisy voice and joint voice.Dictionary matching is used to realize the combination of voice conversion and speech denoising to achieve voice conversion in noisy environment.At the same time,combined with backward elimination algorithm,it can eliminate invalid atoms in the joint dictionary,reduce the size of the joint dictionary horizontally,and improve the conversion efficiency while keeping the conversion performance basically unchanged.The experimental results under the conditions of multi-SNR and multi-noise show that the BE-NMF algorithm in this paper has higher conversion efficiency than the traditional NMF algorithm and the NMF algorithm after spectral noise reduction pretreatment,and the backward elimination algorithm also improves the conversion efficiency to a certain extent.In order to solve the problem of discontinuity of converted speech caused by single frame feature conversion,context information is introduced based on BE-NMF algorithm,which is solved by constructing one frame to form a superframe through multiple frames.Then,the dimension of superframe is reduced vertically by Mel filtering,and the computational complexity is reduced.Combining with the characteristics of speech signal,The harmonic impulse decomposition algorithm is used to decompose the harmonic part and the impulse part.The two parts of the signal are divided and conquered.The harmonic part containing personality information is converted.The impulse part without personality information is processed by Wiener filter,and the harmonic signal part is compensated.The experimental results show that only the conversion of harmonic signal can improve the objective evaluation quality of the conversion,and the compensation effect of impulse signal can obviously improve the subjective auditory quality of speech.In addition,the introduction of context information also improves the conversion quality to a certain extent,which is processed by Mel filtering.Although the conversion quality is reduced in the case of the same frame number,the conversion speed is increased by about 5 times when the frame number is 9,so the conversion speed can be improved at the expense of certain conversion effect,which has a very considerable role in practical application.
Keywords/Search Tags:Voice conversion, Nonnegative matrix factorization, Backward elimination, Context information, Harmonic Percussive decomposition
PDF Full Text Request
Related items