Font Size: a A A

Speaker Recognition Based On UBM And Deep Learning

Posted on:2020-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:C T ChiFull Text:PDF
GTID:2428330596482931Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition(also known as voiceprint recognition)is a technology that extracts the features representing the speaker's identity from the speaker's voice signal for recognition.The advantages of traditional algorithms are prominent when the amount of data is small,but the introduction of deep learning method can make full use of the advantages of today's big data era and make the speaker recognition technology get a breakthrough again.This thesis studies the implementation of speaker recognition algorithm,and the main work is as follows:(1)The preprocessing module for speaker recognition,speech activity detection,methods of extracting speech features and traditional classification models are systematically introduced.(2)The recognition system based on Gaussian mixture unified background model is introduced in detail,and the process of parameter estimation,model training and scoring matching is derived.Various factors influencing the system performance are designed.The speaker recognition experiments in Chinese and English are carried out in the open database and the recorded dataset.(3)A speaker recognition system based on deep learning is built,and the deep CNN network based on ResNet is adopted.In other words,acoustic features are extracted through ResCNN network.Then,averaging pooling to generate a representation vector at the speaker level,,and triad loss function training based on cosine similarity is used.Design experiment to explore different network structure of the system performance,by adding Softmax and cross entropy preliminary training to be improved,at the same time in the open database and his record on the database across Chinese and English language compared experiments respectively,got very good recognition result.In the thesis,the speech recognition system is built based on the traditional mainstream GMM-UBM model and the method based on deep learning,and various experiments are designed to explore the factors affecting the system performance.At the same time,the system realizing speaker recognition is compared horizontally and vertically through the two architectures.
Keywords/Search Tags:Speaker Recognition, MFCC, GMM-UBM, ResCNN, Triplet Loss
PDF Full Text Request
Related items