Font Size: a A A

Research On Speaker Recognition Method Of CNN-GRU Model Based On AM-Softmax Loss Function

Posted on:2022-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhaoFull Text:PDF
GTID:2518306314468064Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Language is the most convenient carrier of information exchange for human beings.People's vocal organs have physiological differences in size,shape,and way of vocalization.These physiological differences are manifested as differences in the waveforms of speech signals of human voices.The technique of identifying people's voice characteristics is called speaker recognition.Speaker recognition is a type of biometric recognition like face recognition and fingerprint recognition.Compared with other recognition methods,speaker recognition is comparatively convenient and low-cost,so it has become the research focus of many companies and scholars.This research studies the mechanism of speaker recognition to explore means to improve speaker recognition in real scenes for the low recognition rate caused by the limitation of speech data quality and environmental factors by meeting the distinction requirements of the speaker recognition model.To improve the existing speaker recognition techniques and obtain a higher recognition rate of the speaker recognition system,this paper aims to reduce the distance between similar sample features and expand the distance between sample features of different types from the perspective of improving the distinction capabilities of the classification model,and expounds the loss functions including Softmax,Center-Loss,A-Softmax,and AM-Softmax;this paper introduces the characteristics of CNN and GRU,and illustrates the construction of a CNN-GRU model;based on the experiments of the CNN-GRU speaker recognition model,comparing the recognition effect of the model when using different loss functions,this paper gives a model scheme with the additional margin Softmax loss function.The interval parameters were obtained by analyzing the relationship between the additional margin Softmax parameter setting and the speaker system recognition rate.The results show that using the method in this paper to compare GMM-UBM,DNN,and LSTM models,and make tastings on two tasks of speaker confirmation and speaker recognition,the results obtained are better than that of other basic models in terms of equal error rate and recognition rate;the equal error rate was 4.48% and the recognition rate was 99.18%.This study examined the robustness of the model in the presence of interference factors in the speaker recognition system,explored the advantages of the Spec Augment data enhancement method over the traditional sound wave data enhancement,and proposed a plan to use Spec Augment data to enhance the training of the speaker model and enhance the model's resistance to external noise.During the experiment,two enhancement methods were set up:(1)Enhancing the voice data of some speakers;(2)Enhancing part of the voice data of each speaker.The two schemes were tried to train the model,and the best training criterion was chosen to achieve the robustness requirements of the model.The results show that the model in this paper has good performance in the two training schemes.Compared with the normal training model,the error rate of the training model after data enhancement was reduced by 13% to 16%,which proves that the method in this paper can meet the robustness requirements of speaker recognition.
Keywords/Search Tags:speaker recognition, specaugment, CNN-GRU, AM-Softmax, layer normalization
PDF Full Text Request
Related items