Font Size: a A A

Target Speech Signal Extraction Algorithm Based On Deep Learning

Posted on:2021-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z K GuoFull Text:PDF
GTID:2428330602978805Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speaker speech extraction is part of the field of speaker speech separation.We extract target speaker speech from single channel observed speech signals under supervised learning.For the general speaker speech separation task,it is often a process of equal separation of each source signal without emphasis,and it is impossible to focus more on the speech signal to be extracted.However,the speaker speech extraction algorithm based on the attention mechanism we proposed makes full use of the auxiliary information of the known target source speech signal to extract the target speech signal we want.With the development of deep learning in recent years,deep learning models have been widely used in the field of image and speech signal processing.This paper mainly studies the extraction of target speaker speech based on the deep learning attention mechanism.The main contributions of this dissertation are summarized below:1.Constructing two deep neural networks:a time-frequency mask:ing estimation network and an auxiliary network for speaker information extraction.The mixed speech and the extra speech different from the target speaker in the mixed speech are respectively used as the input of the two networks.For the auxiliary network to extract the target speaker's information parameters from the target speaker's extra speech,two information parameter extraction methods are used:voice sequence aggregation method and voice sequence aggregation method with attention function.The information parameters output by the auxiliary network are introduced into the hidden layer of the masking estimation network as weight vectors,and the output of each unit is scaled according to the weights to obtain the internal embedded vector corresponding to the target speaker.Finally,the embedding vector corresponding to the target speaker is used to transfer training in the mask estimation network and estimate the mask of the target speaker.2.Constructing a unified neural network framework for speech separation and extraction,and propose a target speaker speech extraction algorithm based on embedded attention mechanism.First,the algorithm regards the separation of the speech spectrum mapping based on the deep neural network as the separation process of the internal embedding vector corresponding to the source signal,and the separation of the internal embedding vector of the speech spectrum mapping network as the construction of the unified neural network framework for separation and extraction Detach the module.Then take the separated embedding vector and the target speaker's extra speech as the input of the embedding attention mechanism module,and extract the embedding vector of the target speaker in the embedding attention mechanism module.Finally,the target speaker's embedding vector is used as the input of the mask estimator module,the entire network is trained with the minimum mean square error criterion and the target speaker's speech mask is estimated,and the target speaker's speech is extracted using the estimated mask.
Keywords/Search Tags:Supervised learning, Single channel speech extraction, Deep neural network, Attention mechanism, Embedding vector
PDF Full Text Request
Related items