Speech enhancement(SE)technology plays a critical role in the extremely complex acoustic environment of human life.SE technology has important research meaning and practical value as to high-quality communication requirements,research and development of high-grade hearing aid equipment,and the normalization of human-computer interaction.The major contributions of this thesis are as follows:1.A mono speech enhancement approach based on integrating Multi-Taper Spectrum(MTM)estimation and spectral subtraction of geometric is proposed.The traditional speech enhancement algorithm has many limitations,and the power spectrum estimation is inaccurate,which leads to the problem that the enhanced speech is typically with more noise or nonlinear distortion.Therefore,a mono speech enhancement approach based integrating MTM estimation and spectral subtraction of geometric is proposed.The new approach employs the MTM to estimate the power spectrum of the noisy speech,and the improved minima controlled recursive average(IMCRA)method to track the estimated noise power spectrum in real time.Furthermore,the spectral subtraction of geometric is used to calculate the enhanced speech.Experiments results have shown that the proposed approach could reduce the noise and alleviate the degree of distortion in the enhanced speech,thus effectively improve the speech quality.Especially,taking the PESQ evaluation index as an example,compared with the spectral subtraction,the minimum mean square error estimation and the spectrum subtraction of geometric method in the unsteady noise environment,the average value of the PESQ evaluation index is increased by 12.6%,16.8% and 5.1% respectively under the conditions of Signal-to-Noise Ratio(SNR)of-5dB,0dB and 5dB.2.A speech enhancement approach based on dynamic speech and dynamic noise joint aware training is proposed.Based on existing Deep Neural Networks(DNN)framework of feature mapping,this thesis proposed a dynamic speech aware training(DSAT)method,when combined with dynamic noise aware training,a speech enhancement approach based on dynamic speech and dynamic noise joint aware training is proposed.By extracting the dynamic speech feature and dynamic noise feature of the noisy speech center frame,and incorporating the noisy speech features which contained the context information as the input vector of the DNN,that is,the dual environment hints of the speech scene and the noise scene are simultaneously given,DNN can learn the complex nonlinear relationship between the noisy speech,the noise and the clean speech better,and achieve more accurate feature mapping.The proposed approach solves the problem of distortion of enhanced speech and poor noise robustness of the original model,reduces residual noise,and also has low computation complexity compared to the referenced algorithm.Experiments results have shown that compared with the original noisy speech,baseline system and referenced algorithm,the average value of STOI evaluation index is increased by 11.4%,4.2% and 2.8% respectively under the above three SNR conditions.Obviously,the proposed approach could effectively improve both speech quality and intelligibility. |