Research On SincNet And Siamese LSTM Based Method For Speaker Verification

Posted on:2021-05-30

Degree:Master

Type:Thesis

Country:China

Candidate:Yihenew Alemu Haile

Full Text:PDF

GTID:2428330611999373

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Speaker verification refers to verifying speakers from their voices.Current methods based on deep neural networks(DNN)can achieve ideal performance,but there are still some problems.Specifically,the current method is not interpretable enough in the front-end feature extraction.Meanwhile,the temporal information is not fully considered when extracting back-end embeddings.In addition,the problem of vanishing gradient is also need to be considered.In this paper,we try to explore the effective methods to solve the above problems,and the main research contents and contributions are summarized as follows:(1)We propose a SincNet and long short-term memory(LSTM)based framework to extract embeddings with more interpretable and temporal information.Specifically,the front-end SincNet introduces the sinc function to obtain the filter response characteristics.And the back-end LSTM with softmax loss is employed to learn the vocal track sound production to identify the speaker identity.Meanwhile,the vanishing gradient problem can be solved by the LSTM to maintain the residual error in backpropagation learning.In addition,the proposed framework is an end-to-end system which can directly match the raw waveforms to embeddings.The experimental results show that the proposed framework can achieve better performance than the baseline methods.(2)We also propose an improved framework based on the SincNet and Siamese LSTM architecture.In this framework,to avoid feature confusion between the same and different speakers,we use the Siamese network to contain two identical sub networks having the same configuration with the shared parameters.Based on the contrastive loss for the Siamese network,the pairs of utterances from the same speaker are mapping to be closer,and the pairs from different speakers are mapping more distantly from each other.Meanwhile,the contrastive loss can take the output of the network for a pair of utterance and calculates its distance of same speaker and contrasts that with the distance to different speaker.Experimental results show that the improved framework can obtain better performance than the first proposed framework,as well as other baseline methods.

Keywords/Search Tags:

speaker verification, DNN, SincNet, LSTM, Siamese network

PDF Full Text Request

Related items

1	Text-Dependent Speaker Verification System
2	Research On Text-independent Multi-speaker Verification
3	Content-independent Speaker Verification Modeland Its Application
4	Research On Speaker Verification System Based On Perceptual Log Area Ratio
5	A Study On The Generative Modelling For Speaker Verification Based On Deep Neural Network
6	Research On Text-Independent Speaker Verification System
7	An Account Identity Verification Model Based On The Pseudo-siamese Network
8	Investigation On Broad Phone Based Text Independent Speaker Verification
9	Research On Polarimetric SAR Image Terrain Classification With Few Labels
10	Research On Speaker Recognition Over Short Utterance And Varying Channels