| Speech enhancement(SE)technology has developed rapidly in the past decades.From the initial traditional unsupervised method to the deep learning method.As an important front-end system in the field of speech,SE has been used in speech communication,hearing assistance,speech recognition,video conferencing and other scenarios.However,for complex application scenarios,Improving the Perceptual Evaluation of Speech Quality(PESQ)and Short Time Objective Intelligibility(STOI)of signals as much as possible is still the goal that needs to be continuously explored in current SE research.Therefore,this paper studies the widely used SE technology in the following aspects:(1)In view of the fact that the existing speech enhancement system does not screen the received signal,it is all enhanced by default,which leads to the weakening of PESQ and STOI after the speech signal with high signal-to-noise ratio(SNR)is processed by SE.Therefore,this paper innovatively proposes a selective speech enhancement method based on quality assessment.The Non-intrusive Speech Quality Assessment(NISQA)algorithm without reference source is used to assessment the quality of the speech signals before and after enhancement.By comparing the speech quality scores before and after en-hancement,the received signals are screened to avoid the weaken-ing of PESQ and STOI caused by redundant enhancement.(2)Under different noise types and noise levels.At the deep neural network(DNN)with ideal binary masking(IBM)as the target.The influence of the five most widely used features mel-frequency cepstral coeffificients(MFCC),gammatone frequency cepstral coeffificients(GFCC),relative spectral trans-formed perceptual linear prediction coeffi-cients(RASTA-PLP),amplitude modulation spectrogram(AMS),and multi-resolution cochleogram(MRCG)on enhanced signals PESQ and STOI under different noise types and levels are explored.The results show that for the signal under any background noise,the features that make the best PESQ and STOI scores are related to the background noise type,SNR level,and whether the noise matches.The results have important reference significance for researchers in related fields.(3)For SE tasks that focus more on PESQ metrics,according to the conclusion that the best feature of the speech signal PESQ score in the experiment of point(2)has a complex relationship with the background noise.Since the quality assessment can excellently avoid the noise type and SNR identification.A SE method based on quality assessment and DNN feature selection is innovatively proposed.Compared with SE systems with single-feature extraction,the quality-based approach selects the feature that gives the best PESQ score.It consistently maintained the desired enhancement in PESQ metrics. |