Font Size: a A A

Psychoacoustic Research And Its Application In Speech Enhancement

Posted on:2018-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:T T ZhouFull Text:PDF
GTID:2358330518492659Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Psychoacoustics mainly deals with the relationship between physical property of sounds and psychological responses associated with sound. It is about how auditory sense deals with acoustical signals, and it builds up a psychoacoustics model for practical use in the fields of scientific study and sound engineering practice. This thesis mainly discusses psychoacoustic masking, spectral division as well as psychoacoustic parameter models, and proposes a model for calculating psychoacoustical fluctuation strength. Furthermore, two improved single channel speech enhancement algorithms are proposed based on the psychoacoustic masking and the spectral division.Similar to roughness, fluctuation strength is a basic psychoacoustical sensation.An important difference is that fluctuation strength reflects slower amplitude variations of sounds. In this thesis, based on equivalent rectangular bandwidth (ERB),a new model for calculating psychoacoustical fluctuation strength is proposed. By setting 75 filter channels on ERB scale the total fluctuation strength is calculated by weighting, filtering, and adding up generalized modulation depth (GMD) in each channel. The most advancement made in the proposed model is altering the way that GMD in each ERB is converted into specific fluctuation strength. In addition, the use of the ERB scale instead of the Bark scale is another big advantage. Compared with the proposed model based on the Bark scale, the proposed model based on the ERB scale can reduce RMSE value by up to 73% and increase correlation coefficient value by up to 17%. Furthermore, the reason why a weighting is used in the final calculation of specific fluctuation strength is also discussed. Experiments show, compared with the existing model [1], the proposed model can reduce RMSE value by more than 90% and increase correlation coefficient value by up to 23%. Therefore, the proposed model can obtain more consistent results with subjective tests.Based on the psychoacoustic study, this thesis also proposes an improved single channel speech enhancement algorithm by using psychoacoustic masking and spectral division. A complete implementation of speech enhancement using psychoacoustic masking based on a suggestion of Virag [2] is presented in this thesis. Essential features have been improved by modifying simultaneous masking and introducing temporal masking in the proposed algorithm. The noise masking threshold is calculated by considering simultaneous masking as well as temporal masking, and then it is used to adapt the subtraction parameters to obtain the best tradeoff between the amount of noise reduction, the speech distortion and the level of residual music noise in a perceptual sense. In addition, this thesis proposes a multi-band spectral subtraction algorithm based on the ERB scale. The whole spectrum of noisy speech is first divided into multiple bands based on the ERB scale, and then the speech enhancement algorithm is applied in every channel. The application of objective measures and subjective listening tests demonstrate that the proposed algorithm outperforms the comparable speech enhancement algorithms.
Keywords/Search Tags:Fluctuation strength, psychoacoustic masking, multi-band spectral subtraction, noise masking threshold, single channel speech enhancement
PDF Full Text Request
Related items