Font Size: a A A

Compression Method Of Deep Neural Network Model For Speech Enhancement

Posted on:2024-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:W J LuFull Text:PDF
GTID:2568307073962149Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The purpose of speech enhancement technology is to eliminate noise interference in complex acoustic environment and improve the quality of speech signal.Based on its excellent nonlinear modeling ability,deep neural networks show excellent advantages in speech enhancement by using large-scale data-driven methods.However,its model size and computational cost are too high to be deployed on resource-constrained embedded devices.At the same time,the construction of large-scale data has data privacy protection and label missing problems,which limits the training and application of the model.Therefore,how to train a speech enhancement model with multiple noise suppression capabilities in an unsupervised manner under the premise of protecting data security is of great significance.The specific research contents of this paper are as follows :(1)Aiming at the problem that privacy protection treaties limit the unification of speech data in different noise scenarios and the lack of labels in public data sets,this paper proposes a multi-teacher knowledge distillation method based on federal learning and knowledge distillation.The knowledge of the multi-teacher model is migrated to the student model to implement an enhanced model that can suppress multiple noises.Firstly,the public data sets are used to construct the source domain speech data sets under different noise conditions and the target domain noise data sets with multiple noises.Then,the teacher model under different noise conditions is pre-trained by different source domain data sets.The experimental results show that the teacher model has strong enhancement performance under the condition of trained noise.When the input signal-to-noise ratio(SNR)is 5d B with different noise speech,the Perceptual Evaluation of Speech Quality(PESQ)score of the corresponding teacher model is improved by 1.67 on average.Finally,a multi-teacher knowledge distillation framework is constructed.The teacher model is regarded as a black box to protect the data privacy security of the source domain.The information interaction between the student model and multiple teacher models is used to alleviate the problem that a single teacher model may have insufficient or excessive enhancement when guiding student training.The experimental results show that the student model has learned the enhancement performance of multiple teacher models for different noises.Compared with random distillation,the PESQ scores of the proposed two-teacher and three-teacher knowledge distillation are greatly improved.(2)Aiming at the problem of large scale and complex calculation of the teacher model network,this paper analyzes and optimizes the network structure of the teacher model to simplify the structure,and obtains a student model with appropriate performance and network size,and quantifies the parameters to further reduce the size of the model.Firstly,the teacher network structure is optimized and the influence of the internal structure of the teacher network on the model scale is analyzed.A large number of experiments and result analysis are carried out by setting different structural parameters to weigh the size and performance of the model.The experimental results show that compared with the teacher model,although the average score of the lightweight model PESQ is reduced by about 0.06,the parameter amount of the model is reduced by 50.00 %,and the calculation amount is reduced by about 54.76 %.Then,the obtained lightweight model is used as the student model for multi-teacher knowledge distillation experiments.Finally,the lightweight student model is quantified by parameter quantification to reduce the storage space and reasoning speed of the model.The experimental results show that the performance of the quantified model on the test set is equivalent to that of the student model.The performance of the quantified model decreased by about 0.02,but the calculation of the model decreased by about 7.54 %,and the reasoning time decreased by about 15.18 %.
Keywords/Search Tags:Deep neural network, Monaural speech enhancement, Model compression, Knowledge distillation, Parameter quantization
PDF Full Text Request
Related items