Font Size: a A A

Research And Application Of Speaker Recognition Based On Deep Learning

Posted on:2022-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:X M ZhangFull Text:PDF
GTID:2518306551456544Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In the system based on biometric recognition,speaker recognition technology has many advantages compared with other biometric recognition technologies.However,due to the high requirements of accuracy and robustness of the system,the level of speaker recognition technology is still difficult to meet these performance requirements of the system.The technology is still in the research phase and has not been used in the actual scene on a large scale.In order to improve the recognition performance and robustness of the model,this paper conducts corresponding researches on the key technologies in speaker recognition: speech feature extraction,loss function and recognition model structure.The main work and innovations are as follows:1.Aiming at the problems of single feature,such as single feature type and insufficient information,an effective shallow fusion method of speech feature is proposed.Compared with MFCC and filterbank,spectrogram has less calculation steps in the feature extraction process and retains more original speech information,which is more suitable for deep learning.This paper studies a variety of shallow feature fusion methods with spectrogram as the main feature and MFCC or filterbank as the auxiliary features,and finally finds the best fusion feature.The effectiveness and effect of the method is verified by a number of comparative experiments on convolutional neural network and recurrent neural network.2.Aiming at the shortage of the speaker center vector calculation method in GE2 E,an end-to-end loss function based on the global center of the speaker is proposed.By comparing and experimenting with softmax,triplet and GE2 E,it is proved that GE2 E is an excellent loss function.However,the speaker center vector in GE2 E is a local value,and there is a big error between the value and the real speaker center vector.Therefore,GC?GE2E loss function,based on the global center of the speaker,is proposed.Through speaker verification test and speaker identification test,it is proved that GC?GE2E is effective for speaker recognition,and its performance is better than GE2 E.3.Based on the speech feature fusion method and loss function proposed in this paper,a speaker recognition model based on multi-scale convolution residual neural network is constructed.In this model,convolution neural network and multi-scale residual network are used to meet the requirements of recognition performance and robustness.The effectiveness of the model is verified by a number of comparative experiments.The robustness of the model is verified by cross dataset and cross language.4.In this paper,A system based on speaker recognition technology is designed and implemented to realize identity verification and recognition.The system uses the above research results and speech recognition technology.The system can realize speaker speech recognition and dynamic password verification,which can effectively prevent the spoofing attack.The system provides registration,verification and recognition function,it has good application value.
Keywords/Search Tags:Speaker Recognition, Biometric Recognition, Deep Learning, Speech Feature, End-to-end Loss Function
PDF Full Text Request
Related items