Font Size: a A A

The Application Of Speaker Recognition Technology Based On Deep Learning

Posted on:2021-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:W H SongFull Text:PDF
GTID:2428330620464043Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous advancement of information age,people's demand for the reliability of identity authentication and the personalized service provision from a specific system is increasing,Therefore,speaker recognition and speaker classification based on user's voice have become research hotpots in the realm of signal processing.The existing deep learning-based speaker recognition and speaker attribute classification tasks are still limited by environmental noise and channel mismatch.This thesis aims to research speaker recognition and speaker attribute classification system with high robustness based on deep learning to improve accuracy of recognition and classification in complex scenarios.In order to achieve the above research objectives,this paper conducts the following research on the speaker recognition and speaker attribute classification:For speaker recognition,this thesis proposes a ResNet-BLSTM network structure based on residual network and bidirectional long short time memory network,using the spectrogram as input,it extracts deep features that are robust to speech speed and richer in representation information.In the training phase,T-Triplet loss based on improved triplet loss is proposed,which strictly controls the intra-class aggregation and inter?class separation of feature vectors,so that the model can accurately cluster the speech samples from same speaker under the noise training set.Finally,experiments were carried out on three corpus,Voxceleb,Librispeech and AISHELL-1,and similar EERs were obtained on three corpus,which demonstrated the robustness of the system in multiple speech environments.Moreover,the proposed system outperforms the I-vector/PLDA baseline by 63%on noise dataset Voxceleb.For speaker attribute classification,this thesis proposes DBN network structure,including a bottleneck layer,all network structures below bottleneck layer were extracted as deep feature extractor.The deep feature extractor extracts the high-level features D-MFCC of MFCC as the input of GMM-UBM model to train the gender-age classification model.The final classification experiment was carried out on the aGender dataset,comparing the classification accuracy using MFCC and D-MFCC as the training input of the classification,D-MFCC achieved an overall classification accuracy improvement of 32.33%,by the way,the classification tasks for adult women and senior men have achieved significant improvements.Feature extraction and model construction of speaker recognition and speaker classification are studied and improved respectively to extract well-represented acoustic features,and robust speaker recognition/classification models are constructed based on these feature sets,so that the recognition/classification task shows superior performance under the experimental corpuses.
Keywords/Search Tags:Speaker Recognition, Speaker Classification, Deep Learning, Triplet Loss, Bottleneck Network
PDF Full Text Request
Related items