Font Size: a A A

Design And Implementation Of CNN-BLSTM Speech Separation Algorithm Fused With Self-attention Mechanism

Posted on:2022-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:H X ZhuFull Text:PDF
GTID:2518306737978939Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology,the demand for artificial intelligence is increasing day by day.The use of deep learning technology to solve the problem of time series feature recognition is the current research focus.Especially in intelligent communication,voice assistant and other aspects,voice separation technology provides a strong technical support to ensure the accuracy of obtaining information.Human voice and noise and the separation of human voice and human voice are two main research directions in the field of speech separation,and they are the basic work of speech signal processing.In this paper,the method of fusing self-attention mechanism(SACNN-BLSTM)is used to design and implement speech separation.The main work includes the following aspects:1)The time series data that meet the requirements of the experiment are obtained.First of all,the time series data set that meets the requirements of the experiment is screened,and after obtaining the time series data that meets the requirements of the experiment,it can not be used directly,so it is necessary to preprocess the data and mix pure speech with pure speech or pure speech with noise to obtain the mixed speech needed in the experiment.2)The experiment of separating human voice from noise.The speech signal with noise is separated and the clean target speech is obtained.In order to make up for the shortcomings of the CNN-BLSTM model,this paper integrates the self-attention mechanism into the CNN-BLSTM,so that the time-frequency features dominated by the target speech get more attention,and there is a more obvious distinction between the target speech and noise,so as to achieve the purpose of noise reduction for the target speech signal.The experimental results show that compared with the CNN-BLSTM model,the SDR of the separated target speech of the SACNN-BLSTM model is improved to a certain extent.3)Separate mixed human voice experiment.The main purpose of mixed voice separation is to separate the mixed speech signals of two speakers and get independent clean speech signals of two speakers respectively.In this paper,the SACNN-BLSTM model is used to model the speech signal,and the speech signal feature is used as the input,and the self-attention mechanism is applied to the high-dimensional abstract feature obtained after the Dilated CNN layer and the BLSTM layer,and the weight is given to each frame time-frequency feature,so that the time-frequency characteristics of the two speakers' respective speech signals are clearly distinguished,and the purpose of separating the mixed speech signal is achieved.The experimental results show that,compared with the CNN-BLSTM model,the speech signal separated by the SACNN-BLSTM model can improve the overall separation performance of the model without losing short-term intelligibility.Based on the above research,the design and implementation of CNN-BLSTM speech separation algorithm based on self-attention mechanism is completed.The algorithm can not only separate the human voice from the environmental noise,but also separate the mixed human voice,so as to improve the speech quality of the speaker.The test shows that the algorithm achieves the expected design goal.
Keywords/Search Tags:Self-attention mechanism, Speech separation, Speech signal, Dilated CNN, BLSTM
PDF Full Text Request
Related items