Font Size: a A A

Research On Speech Enhancement Technology Based On Multi-Objective Learning And Integration

Posted on:2022-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2518306569965739Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement is to recover clean speech components from noisy speech to improve the quality and intelligibility of speech.Traditional speech enhancement algorithms assume that the noise is stationary,but the actual noise in life is random and non-stationary.However,modern neural network speech enhancement algorithms do not need to assume the nature of noise,and can directly learn the mapping relationship from noisy speech to clean speech.However,different training objectives will have different speech enhancement effects,and the learning ability of a single neural network is also limited.Therefore,multi-objective learning and integrated learning algorithms have begun to be applied in the field of speech enhancement.The multi-objective learning speech enhancement algorithm can optimize multiple training objectives at the same time without adding too many additional parameters,but there is a training conflicts at shared layer.The integrated learning speech enhancement algorithm can integrate multiple neural networks in parallel to improve the expression ability of the model,but most of the current speech enhancement algorithms based on integrated learning have the problem of homogeneity of the base model,and the input of the gate control unit is usually the same as the input of the base model,and the frame expansion input is too redundant,which is not conducive to learning fusion decision.This article focuses on the problems of the multi-objective learning speech enhancement algorithm and the integrated learning speech enhancement algorithm,and makes improvements:1.Aiming at the problem of parameter training conflicts in multi-objective learning speech enhancement algorithms,an improved multi-objective learning speech enhancement method is proposed and applied to multi-objective neural network,multi-objective gated recurrent unit network and multi-objective convolutional neural network.In terms of neural network structure,it is a compromise to consider the parameter sharing mechanism of multi-objective learning and the problem of parameter training conflicts.Instead of eliciting multiple network layers at the last layer to match multiple training targets,it matches multi-target branches at the middle network layer.At the same time,in order to allow each training target to fully learn the original input information,the original input information is connected to the output layer corresponding to each training target by means of feature splicing or skip connection.The experimental results show that the PESQ of improved multi-objective neural network is 6.9%higher than its single-objective network,the PESQ of improved multi-objective gated recurrent unit network is 6.12%higher than its single-objective network,the PESQ of improved multi-objective convolutional neural network is 3.12%higher than its single-objective network,the PESQ of improved multi-objective neural network is 2.1%higher than its convention multi-objective neural network,the PESQ of improved multi-objective gated recurrent unit network is 1.35%higher than its convention multi-objective neural network,the PESQ of improved multi-objective convolutional neural network is 1.25%higher than its convention multi-objective neural network.2.Aiming at the homogeneity of the integrated learning speech enhancement algorithm and the input redundancy of the gating unit,a speech enhancement algorithm based on multi-objective learning and integration is proposed.The improved multi-objective neural network,multi-object gated recurrent unit network and multi-objective convolutional neural network are integrated as the base model of the integrated model in parallel to improve the heterogeneity of the base model.Aiming at the input redundancy problem of the gating unit,only the current frame of input is used.In order to compensate for the loss of frame related information,the energy statistical characteristics of the current frame with the previous frame are calculated,and these characteristics are spliced behind the current frame of input.The experimental results show that the PESQ of the speech enhancement model based on multi-objective learning and integration is 9.79%higher than the M-DNN[38]model.
Keywords/Search Tags:Speech enhancement, Deep neural network, Multi-objective learning, Multi-objective integration
PDF Full Text Request
Related items