Research On Near-end Listening Enhancement Algorithm Based On Lombard Speech Conversion

Posted on:2020-10-13

Degree:Master

Type:Thesis

Country:China

Candidate:F Cheng

Full Text:PDF

GTID:2428330590977046

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Thanks to the continuous progress of mobile communication technology,people can communicate anytime and anywhere via voice or even video with aid of the powerful mobile communication networks and terminal devices.However,accompanying the convenience,complex and variable communication scenarios may lead to external noise interference,which will affect the quality and intelligibility of speech,reduce the information exchange efficiency of both parties.The main goal of Near-End Listening Enhancement(NELE)is to improve the intelligibility of speech.The early near-end listening enhancement algorithms build fixed speech modification strategies based on the researchers' knowledge.This rule-based method has the advantages of high efficiency,fastness,good interpretability and no requirement of training data.However,Acoustic features in Lombard effect are numerous and influence with each other,it's difficult to describe the features conversion with simple rules.Therefore,these rule-based methods usually cannot learn the conversion relationship and correlation well between these features.Using fixed modification strategy has relatively limited increase in intelligibility,but seriously detracts the naturalness of speech.With the rapid development of statistical machine learning,statistical-based conversion models such as Gaussian Mixture Model(GMM)have begun to emerge in the field of near-end listening enhancement.By extracting relevant feature parameters of normal speech and Lombard speech with the same speech content,we can construct a mapping model for feature transformation.This mapping model can convert ordinary speech into artificial Lombard speech,thereby improving the intelligibility of speech.However,the current model has some deficiencies,Such as insufficient ability to describe the complex nonlinear transformation relationship of speech features from ordinary speech to Lombard speech and excessive smoothing of reconstruction parameters obtained after conversion makes the hearing sense of reconstructed speech sounds muffled,Ignoring the temporal correlation of the feature itself and the interaction between the mapping features limits the performance of the model.To solve those problems,we propose several models based on deep learning technology,and validates the effectiveness of the proposed model through experiments in this paper.For the existing model has limitation of describing the complex nonlinear transformation relationship of speech features from ordinary speech to Lombard speech,we purposed the mapping model based on Recurrent Neural Network(RNN),which enhances the learning ability of the framework.Subjective and objective experiments show that the LSTM based near end listening enhancement algorithm is more significant in improving the intelligibility of speech than current methods and has obvious advantages in preserving the naturalness of speechLast but not least,in view of the current methods based on statistical learning are not effective use the interaction between the mapping features,this paper studies the variation and correlation of other acoustic features in the process of Lombard speech conversion.by introducing other useful features as auxiliary tasks of the original model,we build a Multi-task learning mapping framework,further enhance the performance and robustness of our method.

Keywords/Search Tags:

speech intelligibility, Near-end speech enhancement, Lombard effect, Recurrent Neural Networks, Multi-task Learning

PDF Full Text Request

Related items

1	The Study Of Features Estimation For Speech Intelligibility Enhancement
2	Speech Enhancement Based On Deep Neural Network And Recurrent Neural Network
3	Research On Deep Learning Speech Enhancement Algorithms That Effectively Improve Speech Intelligibility
4	Research On Supervised Speech Enhancement Based On Deep Neural Networks
5	Speech Enhancement Method Improving Speech Intelligibility Effectively
6	Study On Speech Intelligibility Enhancement In Low Signal-to-Noise Ratio Environment
7	Research On Multi-dimensional Speech Recognition Technology Based On Multi-task Neural Network
8	Research On Single Channel Speech Enhancement Based On Multi-head Attention Mechanism
9	The effect of compression on speech perception as reflected by attention and intelligibility measures
10	A High Intelligibility Signal Subspace Speech-enhancement Algorithm