Font Size: a A A

Research On Sound Source Separation Technology In Speech Recognition System

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiangFull Text:PDF
GTID:2428330614458165Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence technology,speech recognition is increasingly integrated into our lives.However,the application environments of speech recognition are inevitably affected by the interference of various noises or superposition of human voices,which leads to a serious degradation to the performance of the speech recognition system.Therefore,the voice signal interfered by noise or mixed by human voice needs to be processed by the sound source separation technology and then sent to the speech recognition system.According to the type of processed mixed audio signals,sound source separation techniques can be divided into speech enhancement techniques for processing noisy speech signals and multi-speaker separation techniques for processing mixed speech signals.This thesis mainly studies the speech enhancement algorithm based on deep neural network and the multi-speaker separation algorithm based on deep clustering.The main research contents are as follows.Firstly,this thesis studies a priori signal-to-noise ratio estimator based on deep neural networks and its overall framework for speech enhancement algorithms.The enhancement effect of this algorithm depends on the accuracy of the prior signal-to-noise ratio estimation.In the case of low signal-to-noise ratio and the type of noise is indistinguishable from speech,the prior signal-to-noise ratio estimation is not accurate enough and will affect the effect of speech enhancement.To solve this problem,this thesis proposes a blind source separation algorithm that introduces independent low-rank matrix analysis before the prior signal-to-noise ratio estimator.The proposed algorithm first separates the noisy speech signal from the noise and speech,so as to improve the signal-to-noise ratio of the audio signal with speech.The experimental results show that the proposed speech enhancement algorithm obtains better speech enhancement effect than the basic algorithm in the case of low signal-to-noise ratio and some instrumental noise interference,and can improve the recognition rate and robustness of the speech recognition system.Secondly,this thesis studies the multi-speaker separation algorithm based on deep clustering.Aiming at the problem that the K-means clustering algorithm used by this algorithm is sensitive to the selection of the initial center value of the cluster,the separation effect is unstable.This thesis proposes a multi-speaker separation using a deep clustering algorithm based on hierarchical clustering.The proposed algorithm does not need to select the initial center value of the cluster.The experimental results show that the proposed algorithm has higher separation performance and stability than the basic deep clustering algorithm,and it can reduce the word error rate of the speech recognition system.
Keywords/Search Tags:sound source separation, speech recognition, speech enhancement, multi-speaker separation
PDF Full Text Request
Related items