Font Size: a A A

Research On Near-end Listening Enhancement Algorithm Based On Lombard Speech Conversion

Posted on:2020-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:F ChengFull Text:PDF
GTID:2428330590977046Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Thanks to the continuous progress of mobile communication technology,people can communicate anytime and anywhere via voice or even video with aid of the powerful mobile communication networks and terminal devices.However,accompanying the convenience,complex and variable communication scenarios may lead to external noise interference,which will affect the quality and intelligibility of speech,reduce the information exchange efficiency of both parties.The main goal of Near-End Listening Enhancement(NELE)is to improve the intelligibility of speech.The early near-end listening enhancement algorithms build fixed speech modification strategies based on the researchers' knowledge.This rule-based method has the advantages of high efficiency,fastness,good interpretability and no requirement of training data.However,Acoustic features in Lombard effect are numerous and influence with each other,it's difficult to describe the features conversion with simple rules.Therefore,these rule-based methods usually cannot learn the conversion relationship and correlation well between these features.Using fixed modification strategy has relatively limited increase in intelligibility,but seriously detracts the naturalness of speech.With the rapid development of statistical machine learning,statistical-based conversion models such as Gaussian Mixture Model(GMM)have begun to emerge in the field of near-end listening enhancement.By extracting relevant feature parameters of normal speech and Lombard speech with the same speech content,we can construct a mapping model for feature transformation.This mapping model can convert ordinary speech into artificial Lombard speech,thereby improving the intelligibility of speech.However,the current model has some deficiencies,Such as insufficient ability to describe the complex nonlinear transformation relationship of speech features from ordinary speech to Lombard speech and excessive smoothing of reconstruction parameters obtained after conversion makes the hearing sense of reconstructed speech sounds muffled,Ignoring the temporal correlation of the feature itself and the interaction between the mapping features limits the performance of the model.To solve those problems,we propose several models based on deep learning technology,and validates the effectiveness of the proposed model through experiments in this paper.For the existing model has limitation of describing the complex nonlinear transformation relationship of speech features from ordinary speech to Lombard speech,we purposed the mapping model based on Recurrent Neural Network(RNN),which enhances the learning ability of the framework.Subjective and objective experiments show that the LSTM based near end listening enhancement algorithm is more significant in improving the intelligibility of speech than current methods and has obvious advantages in preserving the naturalness of speechLast but not least,in view of the current methods based on statistical learning are not effective use the interaction between the mapping features,this paper studies the variation and correlation of other acoustic features in the process of Lombard speech conversion.by introducing other useful features as auxiliary tasks of the original model,we build a Multi-task learning mapping framework,further enhance the performance and robustness of our method.
Keywords/Search Tags:speech intelligibility, Near-end speech enhancement, Lombard effect, Recurrent Neural Networks, Multi-task Learning
PDF Full Text Request
Related items