Speaker Recognition Algorithm Based On Residual Network

Posted on:2021-03-16

Degree:Master

Type:Thesis

Country:China

Candidate:C Sun

Full Text:PDF

GTID:2428330602989011

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

With the rapid development of speech recognition technology,speaker recognition has become the important part of speech recognition.Speaker recognition technology is not only widely used in many fields such as commerce,security,finance,criminal investigation and personalized application,but also occupies an increasingly important position in biological identification technology.The speaker recognition algorithm,based on residual network,takes the spectrogram image of the speaker's utterance as input,and uses residual network to extract the feature descriptors of the spectrogram image.Then,we can aggregate the feature descriptors into a feature matrix through the aggregation layer of the network.Finally,we use the fully-connected layer to generate discriminative speaker embeddings for discriminating the speaker identity.However,when the speaker recognition model is trained on imbalanced and noisy datasets,the performance of the recognition model will be seriously degraded.In order to improve the performance of the speaker recognition model,and obtain ideal effect on the task of short-duration utterances,three kinds improvements are as follows:(1)One of the problems that we will meet in the practical 'application scenario is that the performance of speaker recognition model will be degraded in the noisy environment.Based on GhostVALD algorithm,the residual network model can extract the discriminative speaker embeddings with high quality.On the basis of the previous research,MultiReader technology and the residual network model based on GhostVLAD algorithm,will realize the multi-level filter on utterance sample in data and feature aspect.Simultaneously,our algorithm improves the performance of speaker recognition model effectively on imbalanced datasets with noise.(2)The normal speaker recognition model is difficult to extract effective speaker embeddings due to the short-duration utterances in the wild.On the basis of the first improved algorithm,we reconstructed the backbone network to UtterIDNet's structure.We use more skip-connection to retain the speaker information so that aggregate the speaker embeddings more effectively.(3)MultiReader technology could handle the problem of imbalanced datasets effectively,but there still have the risk of getting into local optimum due to artificial weight-setting operation.By introducing bayesian optimization algorithm,the accuracy of validation was set as the benchmark,and we have searched the weight of dataset,and approximate the global optimal solution of the model in few iterations.

Keywords/Search Tags:

speaker recognition, residual network, bayesian optimization

PDF Full Text Request

Related items

1	Speaker Recognition Technology Research
2	Research On Text-independent Speaker Recognition Based On Attention Mechanism
3	Research Of Speaker Identification Based On Linear Prediction Residual
4	Studies On Speaker Recognition Based On SVM And GMM
5	End-to-End Speaker Embedding For Speaker Recognition In The Wild
6	Research On Speaker Recognition Based On MFCC And PSO-BP Neural Network
7	Research On The Discrimination Issue In Speaker Recognition
8	Research On Text-Independent Speaker Verification System
9	Probabilistic Modeling Of Emotion Reconstruction For Speaker Recognition
10	Text-independent Speaker Recognition Research Based On Local Acoustic Features