The Study Of Features Estimation For Speech Intelligibility Enhancement

Posted on:2021-09-06

Degree:Master

Type:Thesis

Country:China

Candidate:R Zhang

Full Text:PDF

GTID:2518306194975759

Subject:Communication and Information System

Abstract/Summary:

The coverage of mobile communication networks continues to increase,and people can talk to others via cellphone anytime and anywhere.Therefore,the scenarios of speech communication are more complex and diverse,and it’s more prominent that the environmental noise affects speech communication via cellphone.This paper focuses on the problem that it’s difficult for people in near-end noisy scenarios can’t understand the speech from far-end scenarios.The method to solve this type of problem is named the speech intelligibility enhancement method.Speech intelligibility enhancement methods are generally divided into two categories: methods based on rule mechanisms and methods based on statistical mechanisms.The rule-based methods use empiricism or metrics to modify speech features in time or frequency domain.The data-based methods is to convert normal speech into Lombard speech with higher intelligibility.Lombard speech is derived from the Lombard effect,which means that speakers are prone to change the vocal style instinctively under the stress of ambient noise.This method has gradually become the mainstream method due to the consideration of both speech intelligibility and naturalness.The existing method based on statistical mechanisms uses the extraction of acoustic feature parameters in the vocoder that are designed for clean speech.Nevertheless,for non-clean speech,the performance of features estimation will decrease sharply.For this method,fundamental frequency and spectral envelope are the key acoustic features that affect the overall system performance.There is no doubt that how to estimate the fundamental frequency and spectral envelope characteristic parameters from non-clean speech has become an essential challenge of current speech intelligibility enhancement method.The study focus on the problem that the existing fundamental frequency estimation algorithm cannot estimate accurate fundamental frequency values and unvoiced and voiced decision information from impure signal sources.In view of the this problem,this paper propose the fundamental frequency estimation method based on one-dimensional convolutional neural network proposed,which uses data enhancement and improved fundamental frequency sparse way to improve the accuracy of fundamental frequency estimation,and can obtain better voiced and voiced decision information.Experiments show that the voicing decision error is relatively reduced by 13.55% relative to BLSTM,and the gross pitch error is relatively reduced by 12.83% and 21.17% relative to BLSTM and CREPE,respectively.The study focus on the problem of insufficient accuracy of spectral envelope parameter estimation from non-clean signal sources by existing spectral envelope estimation algorithms.In view of this problem,this paper propose spectral envelope estimation method based on Recurrent Neural Network,which uses the time correlation of speech signals and data enhancement to improve the adaptability of the model.Compared with DNN-based method and Cheap Trick method,the log-spectral distortion is reduced by an average of 4.37% and 9.64% respectively.This paper uses an improved fundamental frequency estimation method based on one-dimensional convolutional neural network and a spectrum envelope estimation algorithm based on Recurrent Neural Network to extract features from non-clean speech,and maps the obtained fundamental frequency and spectral envelope into Lombard style through Gaussian mixture model.The characteristics of the method further use the WORLD vocoder to extract aperiodic information,and finally synthesize Lombard-style speech.The Gaussian Speech Intelligibility Index in Bits of the overall system are improved by an average of 4.66% and 9.78% respectively compared with the comparison algorithms Net-base and W-base,and the MOS scores are increased by 0.2 and 0.5 respectively.The fundamental frequency and spectral envelope estimation method proposed in this paper make the current speech intelligibility enhancement system based on statistical mechanism applicable to non-clean speech,so that the system can apply on more scenarios and can effectively improve intelligibility and naturalness to enhance the user experience of speech communication via smartphone.

Keywords/Search Tags:

Intelligibility, Non-Clean speech, Lombard effect, Acoustic feature estimation, Neural networks

Related items

1	Research On Speech Intelligibility Enhancement Based On Lombard Effect
2	Research On Near-end Listening Enhancement Algorithm Based On Lombard Speech Conversion
3	Speech Enhancement Method Improving Speech Intelligibility Effectively
4	The effect of compression on speech perception as reflected by attention and intelligibility measures
5	Research On Speech Separation Method Based On Causal Feature Input And Multi-Scale Feature Fusion
6	Research On Speech Recognition Method Based On Deep Learning
7	Analysis and compensation of the Lombard effect under different types and levels of noise with application to in-set/out-of-set speaker recognition
8	Speech Enhancement Based On Deep Neural Network And Recurrent Neural Network
9	Research On Speech Recognition Based On Convolutional Neural Networks
10	Uyghur Speech Recognition Based On Deep Recurrent Neural Network