Research On Deep Learning Methods For Use With Speaker Recognition

Posted on:2020-12-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Fan

Full Text:PDF

GTID:2428330590977187

Subject:Electronic and communication engineering

Abstract/Summary:

As one of the most popular biometric recognition technologies in today's society,speaker recognition is widely used in human-computer interaction,identity verification,information retrieval,etc.It has important research significance and practical value.In recent years,with the successful application of deep learning in speech recognition,the research on speaker recognition based on deep learning has also received extensive attention from researchers.This thesis mainly studies the deep learning method for speaker recognition,studies how to effectively combine deep learning theory and speaker recognition technology,and establishes recognition system under the condition of limited training data;how to combine different kinds of deep neural networks to build high performance Speaker recognition model.The main research contents are as follows:(1)The speaker recognition method based on MFCC-CNN is studied.Although the deep neural network can integrate the feature extraction and the recognition classification,the original voice can be directly recognized end-to-end.But the premise is that a lot of training data is needed to get a better recognition effect.In order to improve the speaker recognition effect in a small amount of data environment,this paper constructs a speaker recognition model based on MFCC-CNN.The model first extracts MFCC(Mel-Frequency Cepstral Coefficients,MFCC)parameters from the original voice as voice features,and then uses CNN(Convolutional Neural Network,CNN)for recognition.At the same time,in order to prevent over-fitting,the model is optimized by introducing Dropout and L2 regularization.The experimental results show that the MFCC-CNN-based speaker recognition method is higher than the end-to-end deep speaker recognition method when the network training time is greatly shortened.(2)The speaker recognition method based on MFCC-CNN-LSTM hybrid deep neural network is studied.CNN can overcome the instability problem caused by time-frequency offset in traditional speaker recognition,but does not consider the association information between voice contexts.This paper proposes a speaker recognition method based on MFCC-CNN-LSTM.Firstly,the inter-frame feature is extracted by CNN,and then contextual speech frame recognition is performed byLSTM(Long Short Term Memory,LSTM).This approach combines the advantages of the CNN and LSTM models.The experimental results show that the recognition performance of MFCC-CNN-LSTM hybrid model is better than single MFCC-CNN model and MFCC-LSTM model,and it has good robustness and stability.

Keywords/Search Tags:

Speaker recognition, GMM-UBM, MFCC, Convolutional neural network, Long short term memory

Related items

1	Acceleration Gesture Recognition Based On Long-short Term Memory Network
2	Speaker Emotional State Recognition Based On Speech And Text Fusion
3	Chinese Sign Language Recognition Based On Convolutional Network And Long Short Term Memory Network
4	Design Of Speaker Recognition Algorithm Based On Long Short-term Memory Networks
5	Long Short Term Memory Recurrent Neural Network Application To Handwritten Recognition
6	Speech Enhancement Based On Optimized Full Convolution And Long-short Term Memory Network
7	Research On Network Intrusion Detection Method Based On Bi-LSTM
8	Research On Abnormal Behavior Identification Based On Long Short-term Memory Neural Network
9	Online Handwritten Math Expression Label Recognition Based On Long Short Term Memory Recurrent Neural Network
10	Studies On Digital Modulation Signal Recognition Based On Convolutional Neural Network