Font Size: a A A

Research On Deep Learning Methods For Use With Speaker Recognition

Posted on:2020-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y FanFull Text:PDF
GTID:2428330590977187Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
As one of the most popular biometric recognition technologies in today's society,speaker recognition is widely used in human-computer interaction,identity verification,information retrieval,etc.It has important research significance and practical value.In recent years,with the successful application of deep learning in speech recognition,the research on speaker recognition based on deep learning has also received extensive attention from researchers.This thesis mainly studies the deep learning method for speaker recognition,studies how to effectively combine deep learning theory and speaker recognition technology,and establishes recognition system under the condition of limited training data;how to combine different kinds of deep neural networks to build high performance Speaker recognition model.The main research contents are as follows:(1)The speaker recognition method based on MFCC-CNN is studied.Although the deep neural network can integrate the feature extraction and the recognition classification,the original voice can be directly recognized end-to-end.But the premise is that a lot of training data is needed to get a better recognition effect.In order to improve the speaker recognition effect in a small amount of data environment,this paper constructs a speaker recognition model based on MFCC-CNN.The model first extracts MFCC(Mel-Frequency Cepstral Coefficients,MFCC)parameters from the original voice as voice features,and then uses CNN(Convolutional Neural Network,CNN)for recognition.At the same time,in order to prevent over-fitting,the model is optimized by introducing Dropout and L2 regularization.The experimental results show that the MFCC-CNN-based speaker recognition method is higher than the end-to-end deep speaker recognition method when the network training time is greatly shortened.(2)The speaker recognition method based on MFCC-CNN-LSTM hybrid deep neural network is studied.CNN can overcome the instability problem caused by time-frequency offset in traditional speaker recognition,but does not consider the association information between voice contexts.This paper proposes a speaker recognition method based on MFCC-CNN-LSTM.Firstly,the inter-frame feature is extracted by CNN,and then contextual speech frame recognition is performed byLSTM(Long Short Term Memory,LSTM).This approach combines the advantages of the CNN and LSTM models.The experimental results show that the recognition performance of MFCC-CNN-LSTM hybrid model is better than single MFCC-CNN model and MFCC-LSTM model,and it has good robustness and stability.
Keywords/Search Tags:Speaker recognition, GMM-UBM, MFCC, Convolutional neural network, Long short term memory
PDF Full Text Request
Related items