Font Size: a A A

Research On Text-independent Speaker Recognition Based On Attention Mechanism

Posted on:2021-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:M H WangFull Text:PDF
GTID:2518306128976609Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Speaker Recognition is a technique for distinguishing the speaker's identity based on the speaker's voice,and belongs to a type of biometric authentication.With the application of deep learning technology in speaker recognition technology,the performance of speaker recognition technology has achieved new breakthroughs in both the length of registered speech and the accuracy of recognition.The current challenges of speaker recognition technology mainly include anti-phrase sound capability,anti-noise capability,anti-channel difference capability,anti-counterfeit attack capability and anti-time-varying capability.The main research direction of this paper is how to extract more effective information that can represent the identity of the speaker in the shorter speaker speech.Based on the above problems,this paper has carried out research,the main work is as follows:First,a brief introduction to the relevant knowledge in speaker recognition technology,including the relevant voice principles,development history,basic processes,measurement indicators,and current research difficulties and hotpots of speaker recognition technology.Second,the research on text-independent speaker recognition based on the structure of Deep Residual Network(Res Net),including speaker identification and speaker confirmation.The experimental results show that the decision of the speaker recognition system using Res Net for different speakers is not Obviously,after analysis,the reason may be that the test voice is easily affected by a variety of factors such as the speaker's emotions,physiological status,and speaking speed.Using the Res Net network alone cannot maximize the effective features of the speaker's voice.Third,combined with the above analysis,the attention mechanism is fused on the basis of Res Net.The main idea is to perform weighted calculation on the Feature Map features extracted by Res Net at each layer,giving greater weight to the features that are effective for speaker identity and invalid information.It is given less weight,and it is most likely to learn more and effective features in shorter speech.The final experimental results show that the Res Net model after integrating the attention mechanism is more comprehensive for the speaker feature learning.Compared with the simple Res Net structure,the experimental results have been significantly improved.Finally based on the proposed residual neural network fusion attention mechanism,a speaker recognition demonstration system was developed.This system consists of four modules: voiceprint registration,voiceprint confirmation,voiceprint identification,and voiceprint tracking.After real environment testing,the system can complete the demonstration function.In this paper,the model training(a total of 1000h)was conducted on the AISHELL2 data set(2006 people),the test set(904 people)was the Libri Speech data set(504 people)and part of the Uyghur language data set(400 people).The experimental results are compared in terms of the number of registration statements and the Embedding feature vector dimension of the model.Under the same number of registration statements and the same Embedding feature vector dimension,The experimental results show that,compared with the pure residual neural network,the residual neural network with the attention mechanism has a significant improvement on the problem of speaker identification and speaker confirmation.When the number of registered sentences is 1,The experimental results of the Res Net network structure incorporating the attention mechanism are better than those using the Res Net network alone.Using the network incorporating the attention mechanism can extract more features representing the identity of the speaker from the speaker sentence,suppressing the The speaker's identity feature contributes little information.
Keywords/Search Tags:Speaker Recognition, Residual Neural Network, Attention Mechanism, Mel-scale Frequency Cep-strum Coefficients
PDF Full Text Request
Related items