Font Size: a A A

Speaker Recognition Algorithm Based On Residual Network

Posted on:2021-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:C SunFull Text:PDF
GTID:2428330602989011Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of speech recognition technology,speaker recognition has become the important part of speech recognition.Speaker recognition technology is not only widely used in many fields such as commerce,security,finance,criminal investigation and personalized application,but also occupies an increasingly important position in biological identification technology.The speaker recognition algorithm,based on residual network,takes the spectrogram image of the speaker's utterance as input,and uses residual network to extract the feature descriptors of the spectrogram image.Then,we can aggregate the feature descriptors into a feature matrix through the aggregation layer of the network.Finally,we use the fully-connected layer to generate discriminative speaker embeddings for discriminating the speaker identity.However,when the speaker recognition model is trained on imbalanced and noisy datasets,the performance of the recognition model will be seriously degraded.In order to improve the performance of the speaker recognition model,and obtain ideal effect on the task of short-duration utterances,three kinds improvements are as follows:(1)One of the problems that we will meet in the practical 'application scenario is that the performance of speaker recognition model will be degraded in the noisy environment.Based on GhostVALD algorithm,the residual network model can extract the discriminative speaker embeddings with high quality.On the basis of the previous research,MultiReader technology and the residual network model based on GhostVLAD algorithm,will realize the multi-level filter on utterance sample in data and feature aspect.Simultaneously,our algorithm improves the performance of speaker recognition model effectively on imbalanced datasets with noise.(2)The normal speaker recognition model is difficult to extract effective speaker embeddings due to the short-duration utterances in the wild.On the basis of the first improved algorithm,we reconstructed the backbone network to UtterIDNet's structure.We use more skip-connection to retain the speaker information so that aggregate the speaker embeddings more effectively.(3)MultiReader technology could handle the problem of imbalanced datasets effectively,but there still have the risk of getting into local optimum due to artificial weight-setting operation.By introducing bayesian optimization algorithm,the accuracy of validation was set as the benchmark,and we have searched the weight of dataset,and approximate the global optimal solution of the model in few iterations.
Keywords/Search Tags:speaker recognition, residual network, bayesian optimization
PDF Full Text Request
Related items