Font Size: a A A

Research On Single-channel Speech Enhancement Method Based On Deep Neural Networ

Posted on:2024-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:X F GeFull Text:PDF
GTID:2568307049482584Subject:Engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement(SE)is one of the research hotspots in the field of intelligent speech at present.It is a key technology in the application fields of real-time communication,intelligent furniture,wearable medical devices,etc.With the innovation and development of deep learning technology,speech enhancement technology based on deep neural network has gradually replaced the traditional speech enhancement technology based on signal processing due to its excellent performance,becoming the research focus of researchers in this field and being widely used.Although speech enhancement technology has made significant and obvious progress in recent years,the following problems still limit the performance of speech enhancement system and its application in real scenes to a large extent:(1)In many application tasks such as real-time communication,there is a strict requirement for realtime performance of speech enhancement system,which makes the system’s parameters and computation amount severely limited.How to design a speech enhancement system with low delay and high performance is one of the current challenges;(2)Over attenuation is a common phenomenon in the field of speech enhancement,which will bring irreversible distortion to speech.Severe over attenuation will reduce the intelligibility of speech,which is obviously contrary to the original intention of speech enhancement.How to solve the problem of over attenuation is also one of the hotspots in the field of speech enhancement at present;(3)Neither the traditional speech enhancement methods nor the speech enhancement technology based on deep neural network can accurately remove the interference speaker’s speech that may be contained in the signal,which limits the application of the speech enhancement system in real life scenes.How to realize personalized speech enhancement(PSE),retaining only the target speaker’s speech,removing the interference speaker’s speech and noise has gradually attracted attention in recent years.However,the research on this task is still relatively few,and the problems and challenges still need to be found and solved.This paper focuses on the above difficulties in the field of speech enhancement and conducts indepth research,mainly including the following innovations:(1)For the single channel real-time speech enhancement task,the baseline system Percep Net is reproduced,and a phase-aware structure including the deep neural network model and acoustic features is proposed to improve the speech enhancement performance without affecting the real-time performance of the system.(2)Based on Percep Net system,a creative multi-task learning strategy and a postprocessing technology based on signal-to-noise ratio(SNR)estimation are proposed to alleviate the over attenuation problem in single channel real-time speech enhancement task.(3)For the personalized speech enhancement task,a dynamic acoustic compensation is proposed to alleviate the acoustic environment mismatch between the test speech and the enrollment speech based on the baseline system s DPCCN.The adaptive focal training mechanism is used to improve the performance of hard samples and improve the system performance and robustness.In this paper,several open source datasets are used for experiments,among which Mc Gill TSP speech database,NTT Multi Lingual Speech Database for Telephony and VCTK datasets are selected to verify the effectiveness of the proposed phase-aware structure,multi-task learning strategy and post-processing technology based on signalto-noise ratio estimation;The 4th Deep Noise Suppression(DNS)Challenge track2 dataset was selected to verify the effectiveness of dynamic acoustic compensation mechanism and adaptive focal training to solve the acoustic environment mismatch and hard sample problems in personalized speech enhancement task.The experimental results show that compared with the baseline system,the innovation proposed in this paper can greatly improve the speech enhancement performance and system robustness.It provide an important reference for the development and implementation of single channel real-time speech enhancement technology and single channel personalized speech enhancement technology in the future.
Keywords/Search Tags:Speech enhancement, Phase-aware, Multi-task learning, Dynamic acoustic compensation, Adaptive focal training
PDF Full Text Request
Related items