Font Size: a A A

Research On Single Channel Speech Separation Algorithm Based On Uncertainty Measurement

Posted on:2022-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhangFull Text:PDF
GTID:2518306782452424Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Single Channel Speech Separation(SCSS)refers to the process of separating the speech signals of multiple speakers in one-dimensional mixed speaker signal data.Because a single microphone is inexpensive,easy to deploy,and has a wider range of applications,single-channel speech separation technology has a wide range of applications.In practical applications,the existing single-channel speech separation techniques are often easily interfered by unknown noise,and the generalization performance is seriously degraded.Aiming at the above problems,this thesis introduces the uncertainty measurement method based on the convolution time-domain audio separation network to construct the signal-to-noise ratio estimation subnet of the separation result,and reduces the uncertainty of the model through the adaptive frequency modulation network.Significantly improves the separation performance of speech separation models when faced with mixed speech signals containing unknown noise.The main work and contributions of this thesis include the following three aspects:(1)This thesis studies the generation principle of speech and the modeling process of single-channel mixed speech.13,000 pieces of mixed speech signal data in different noise environments are obtained by fusing the clean speech data of Libri Speech and the noise data of Noise X and Nonspeech.This provides data support for subsequent model training.This thesis analyzes and summarizes the current feature extraction methods in the field of single-channel speech separation and the single-channel speech separation model that is widely used and has better effects.The commonly used separation speech quality evaluation indicators in the field of single-channel speech separation are summarized.(2)Aiming at the problem that the generalization ability of the single-channel speech separation model is prone to a serious decline in the unknown noise environment,this thesis proposes the Single-channel speech separation method based on an adaptive frequency modulation network(SSM-FM).Based on the Conv-Tasnet,the method measures cognitive uncertainty by measuring the difference between the SI-SNR of the test signal and the SI-SNR of the training signal.When the uncertainty of the test signal exceeds the threshold,the adaptive frequency modulation network is used to adjust the frequency domain of the test signal to reduce the gap between the training and test noise in the feature space and reduce the cognitive uncertainty of the model.The experiment of 13000 mixed data on the public data set shows that compared with the single Conv-Tasnet,the SI-SNR index is increased from 2.83 db to 4.63 db,with an increase of 63.60%;Compared with Conv-Tasnet with Soft-Masks uncertainty measurement mechanism,the SI-SNR index is increased from3.41 db to 4.63 db,with an increase of 35.78%.(3)The existing uncertainty measurement method will seriously increase the average separation time of the model and reduce the real-time separation problem of the speech separation model.This thesis proposes the Single-channel speech separation method based on separate SNR regression estimation and adaptive frequency modulation network(SSM-REFM).This method replaces the speech separation network at the input end and FM end in the SSM-FM method by constructing a signal-to-noise ratio estimation subnet of the separation result.Such a structure can directly estimate the signal-to-noise ratio of the separation result from the mixed speech signal,without the need to completely separate the mixed speech signal and then calculate the signal-to-noise ratio of the separation result,thereby shortening the speech separation time.Experiments on 13000 mixed data on public data sets show that compared with the SSM-FM method,SSM-REFMmethod reduces the separation time of a single noisy mixed speech signal by 62.72% without reducing the SI-SNR index and STOI index.
Keywords/Search Tags:Speech Separation, Uncertainty Measurement, Noise Robustness, Neural Network
PDF Full Text Request
Related items