Font Size: a A A

Research And Application Of Speech Separation Algorithm Based On Deep Neural Network

Posted on:2022-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y X QianFull Text:PDF
GTID:2518306764977409Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the widespread popularization of the Internet and intelligent terminal equipment,various intelligent applications based on the mobile Internet have gradually entered people's daily life,and the speech data in the Internet has also increased rapidly,making speech processing and analysis related technologies gradually become a research topic.However,there are still many interfering sound sources in practical application scenarios.These interfering sound sources greatly affect the performance of important speech processing algorithms such as speech recognition,and the results of speech processing determine the accuracy of the overall intelligent service.As a technical means to extract speech signals for each speaker from a mixed speech signal containing two or more human voices and noise,speech separation can effectively expand the application scenarios of speech processing applications.This thesis mainly studies the single-channel speech separation algorithm based on deep learning and its application.By sorting out the context of related research,analyzing the modeling ideas and existing defects in related work,the key problems to be solved in this thesis are determined,and solutions are proposed.Scenarios are designed with a variety of model modules.This thesis proposes a method of selecting and fusing crossdomain features in the encoder of separation model.By fusing the features extracted in different ways in the feature encoding stage of the model,the feature maps used in the separator and decoder are unified into the fused feature map,and the step of reconstructing the signal using different domain features in the decoder is omitted,which effectively improves the performance of the speech separation model.A variety of crossdomain feature selection and fusion modules are designed to realize the fusion of crossdomain features.Experiments show that proposed methods achieve encouraging results on the large and challenging Libri2 Mix dataset with a small increasing in parameters.Furthermore,proposed method has shown good generalization ability on the unmatched VCTK2 Mix dataset.Inspired by the recent work,this thesis designs a speech separation model which has the ablity to handle various numbers of speakers based on cross-domain features.Experiments show that the model can accurately predict the number of speakers of a speech with high overap rate,and the performance is close to models that outputs a fixed number of speakers.The proposed method effectively improves the performance of the single-channel speech separation algorithm based on deep learning and expands the application scenarios.Future research includes exploring the use of the proposed cross-domain feature selection and fusion module at different stages of the model,exploring the use of different feature extraction methods,the model design in more speaker scenarios,the speaker counter design under low overlap rate,and the lightweight edition for terminal device deployment.
Keywords/Search Tags:Neural Networks, Speech Separation, Cross-Domain Features, Multi-Talker
PDF Full Text Request
Related items