Speaker Recognition Based On UBM And Deep Learning

Posted on:2020-04-23

Degree:Master

Type:Thesis

Country:China

Candidate:C T Chi

Full Text:PDF

GTID:2428330596482931

Subject:Electronic and communication engineering

Abstract/Summary:

Speaker recognition(also known as voiceprint recognition)is a technology that extracts the features representing the speaker's identity from the speaker's voice signal for recognition.The advantages of traditional algorithms are prominent when the amount of data is small,but the introduction of deep learning method can make full use of the advantages of today's big data era and make the speaker recognition technology get a breakthrough again.This thesis studies the implementation of speaker recognition algorithm,and the main work is as follows:(1)The preprocessing module for speaker recognition,speech activity detection,methods of extracting speech features and traditional classification models are systematically introduced.(2)The recognition system based on Gaussian mixture unified background model is introduced in detail,and the process of parameter estimation,model training and scoring matching is derived.Various factors influencing the system performance are designed.The speaker recognition experiments in Chinese and English are carried out in the open database and the recorded dataset.(3)A speaker recognition system based on deep learning is built,and the deep CNN network based on ResNet is adopted.In other words,acoustic features are extracted through ResCNN network.Then,averaging pooling to generate a representation vector at the speaker level,,and triad loss function training based on cosine similarity is used.Design experiment to explore different network structure of the system performance,by adding Softmax and cross entropy preliminary training to be improved,at the same time in the open database and his record on the database across Chinese and English language compared experiments respectively,got very good recognition result.In the thesis,the speech recognition system is built based on the traditional mainstream GMM-UBM model and the method based on deep learning,and various experiments are designed to explore the factors affecting the system performance.At the same time,the system realizing speaker recognition is compared horizontally and vertically through the two architectures.

Keywords/Search Tags:

Speaker Recognition, MFCC, GMM-UBM, ResCNN, Triplet Loss

Related items

1	Triplet Loss And Manifold Dimensionality Reduction Based Method For Text-independent Speaker Recognition
2	Speaker Recognition Algorithm Based On Deep Learning
3	The Application Of Speaker Recognition Technology Based On Deep Learning
4	The Study Of Speaker Recognition System Based On MFCC
5	Study Of Speaker Recognition System Based On MFCC And GMM
6	Research On Deep Learning Based Speaker Recognition Algorithm
7	The Research On Cross-lingual Speaker Recognition Based On Language-adversarial Training
8	Research On Speaker Recognition Algorithm Based On Deep Convolutional Neural Network
9	Speaker Recognition Based On MFCC And IMFCC
10	Research On Voiceprint Recognition Model Based On End-to-end Neural Network