Font Size: a A A

Research On Voiceprint Recognition Algorithm Based On Deep Learning

Posted on:2022-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:R P LiFull Text:PDF
GTID:2518306512963419Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The development of modern technology and the rise of artificial intelligence have brought unprecedented convenience to life,especially the use of biometric recognition technology has been closely related to personal life.However,such biometric-based technologies,such as fingerprint recognition and iris recognition,ha ve the problems of being easily stolen or limited in usage scenarios.Voiceprint,as a feature of speech that can express the identity information of the speaker,has variability and is not easy to forge,so that the voiceprint recognition technology can effectively avoid the above problems in actual use.Voiceprint recognition is the process of identifying the speaker's identity based on the voiceprint features in the speech to be recognized.According to different tasks,voiceprint recognition can be divided into voiceprint identification and voiceprint verification;according to whether the content of the speech text is limited,it can be divided into text-dependent voiceprint recognition and text-independent voiceprint recognition.This dissertation focuses on the problems in text-independent voiceprint recognition,which is difficult to study.The main tasks are as follows:(1)In view of the problem that the use of MFCC features or Fbank features in the deep model will cause the performance of the model to decline,the spectrogram or logarithmic energy spectrogram of the speech signal is used as the input of the model.The spectrogram can more completely retain the identity information of the speaker's speech and fully stimulate the learning potential of the neural network;the logarithmic energy spectrogram not only has the above advantages,but also can improve the noise resistance of the model,laying a good foundation for the model to extract more distinguishable depth features.(2)Changing the voice preprocessing method can obtain more accurate features,but cannot enhance the distinction between different categories in the feature space.To solve this problem,use an additional angular margin loss function to divide the decisio n boundary in the feature space,and achieve the purpose of constraining feature category clustering.At the same time,the average pooling layer,dropout layer and BN layer are added to the Res Net34 to reduce the amount of data calculation of the model and improve the training efficiency.The experimental results show that the accuracy of Top-1 and Top-5 in the voiceprint identification task reached 90.1% and 97.8%,respectively,and the Equal Error Rate(EER)in the voiceprint verification task was reduced to 3.8%.Compared with the existing results based on the Vox Celeb1 data set,the performance of the three ind icators has been significantly improved.(3)Aiming at how to achieve the problem of not only occupying less computing resources but also retaining the good performance of the large model,the research on the model compression method found that the traditional knowledge distillation has limitations,so the method of teacher-free knowledge distillation model is introduced.In the process of constructing the teacher-free voiceprint verification model,the spatial-shared and channel-wise dynamic activation function and the additional angular margin loss function are added to enhance the model's ability in deep feature extraction and feature resolution.The experimental results show that the training efficiency and generalized expression ability of the model on text-independent voiceprint verification task are improved,and the performance index is consistent with or even slightly higher than the large model,when the number of parameters and calculation amount of the model are reduced by half.In summary,through research on the latest theories of deep learning and experime ntal analysis on large data set,it can be seen that the method used improves the per formance indicators of text-independent voiceprint recognition,and can reduce the required calculation load without loss of performance,which proves the effectiveness of the method and realizes the purpose of improving the voiceprint recognition algorithm.
Keywords/Search Tags:Voiceprint recognition, Deep learning, Teacher-free knowledge distillation, Model compression, Additive angular margin loss function, Dynamic activation function
PDF Full Text Request
Related items