Font Size: a A A

Research On Speech Enhancement Algorithm Based On Phase Spectrum Reconstruction Joint Amplitude Spectrum Estimation

Posted on:2021-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ZhangFull Text:PDF
GTID:2518306110997269Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The purpose of current speech enhancement is to recover clean speech from noisy speech as much as possible,improve the perceived quality and intelligibility of speech,so that the recipient can listen more comfortably and accurately.Usually speech enhancement uses short-time Fourier transform(STFT)to transform the time domain signal into the frequency domain for processing.However,people usually only pay attention to the amplitude information in the frequency domain and ignore the phase information.This is mainly due to the unstructured distribution of the phase spectrum due to phase wrapping.However,recent studies have shown that phase information can effectively improve speech performance,so the estimation and reconstruction of phase information is an important issue in speech enhancement.Furthermore,if only the phase information is enhanced,although the noise signal can be suppressed,the problem of speech distortion will be introduced.In view of the problems raised above,this thesis mainly studies the speech phase spectrum reconstruction method and speech spectrum enhancement algorithm based on amplitude spectrum joint phase spectrum estimation.The main work of this article includes:(1)A brief introduction to the research significance of speech enhancement and the current research status at home and abroad,as well as the common speech characteristics.Discussed the speech enhancement algorithm framework based on deep learning,analyzed the enhancement steps,and several classic speech enhancement training targets.In addition,the learning framework based on multiobjective neural network and its application in this study are introduced.(2)To solve the problem that many current deep learning-based speech enhancement algorithms cannot handle the highly unstructured phase spectrum,an improved phase spectrum compensation speech enhancement method is proposed.The algorithm combined with the signal-to-noise ratio(SNR)improves the traditional phase spectrum compensation(PSC),which can flexibly suppress the noise phase according to the change of speech energy,and improves the denoising performance of the traditional PSC algorithm under non-stationary noise.In addition,a multi-objective neural network is used,and the amplitude spectrum masking and the improved PSC algorithm are used as training targets for joint estimation.The enhancement result is evaluated objectively,and the evaluation score proves that the improved algorithm can effectively improve the quality and intelligibility of speech.(3)To solve the problem that the traditional phase ratio algorithm has a single structure and cannot effectively deal with non-stationary noise-containing speech,a phase reconstruction method based on voiced harmonic phase ratio is proposed.Research has proved that the effective enhancement of voiced speech can improve the performance of the algorithm under non-stationary noise,and the voiced speech has a clear harmonic structure,which is conducive to distinguishing noise signals.The algorithm first proposed a classification method for unvoiced and voiced sounds based on linear predictive coding(LPC),which is superior to traditional methods in extracting voiced speech segments,then combined the voiced speech segments with the harmonic model to extract harmonic phases to calculate the phase ratio,and finally combined Spectrum reconstructed phase is enhanced speech.Compared with the traditional algorithm,the improved phase ratio algorithm improves the performance under non-stationary noise and improves the stability of denoising under female speakers.In addition,this method is universal.In order to solve the problem of speech distortion caused by phase enhancement only,a phase ratio speech enhancement algorithm with joint amplitude spectrum estimation is proposed.The multi-objective neural network jointly estimates the amplitude spectrum masking and the improved phase ratio.The enhanced speech obtained by training is superior to the traditional algorithm in terms of speech quality and intelligibility scores.
Keywords/Search Tags:Speech Enhancement, Phase Reconstruction, Phase Spectrum Compensation, Phase Ratio, Amplitude Spectrum Masking, Joint Estimation
PDF Full Text Request
Related items