Font Size: a A A

Research And Application Of Speaker Recognition Technology Based On Deep Learning

Posted on:2022-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z ChenFull Text:PDF
GTID:2558306914462284Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition is a biometric technology,which identifies the identity of a speaker based on the personality information in the speaker’s voice signal.Because of the low cost of collection and high user acceptance,speaker recognition has been widely used in fields such as mobile payment and community security.With the continuous development of science and technology,speaker recognition technology based on deep learning has achieved impressive performance.However,factors such as environmental noise,channel mismatch and emotional state of the speaker still restrict the performance of speaker recognition systems to some extent.The goal of this thesis is to apply deep learning technology to design a robust speaker recognition system,and improve the recognition performance in complex environments.The speaker recognition systems proposed in this thesis are tested in noisy,cross channel and cross language scenarios,and the results show that the proposed system is robust in a variety of different working environments.The main contents of this thesis are as follows:1.Propose a speaker recognition method based on a two-dimensional convolutional neural network,and innovatively integrate channel or spatial attention mechanism module into the speaker’s frame-level feature extraction module.At the same time,the GhostVLAD method is applied in the speaker’s utterance-level feature aggregation module,which also works on the channel dimension of features,and can be combined with the attention mechanism in the frame-level feature extraction module to further improve the performance of the speaker recognition system.2.Based on the existing one-dimensional convolutional time-delay deep neural network model ECAPA-TDNN,a variety of attention mechanism modules are also integrated with it,and the prototypical network loss function is applied in the optimization stage of the neural network to improve the performance of the speaker recognition system.3.Considering the decisive effect of the model training process on the performance of the speaker recognition system,this thesis systematically studies the loss function based on classification and the loss function based on metric learning,which are commonly used in the field of speaker recognition,and summarizes the influence of these two kinds of loss functions on the performance of speaker recognition system through experimental comparison,and analyzes the reasons.
Keywords/Search Tags:speaker recognition, deep learning, attention mechanism, loss function
PDF Full Text Request
Related items