Font Size: a A A

Research Of Speaker Recognition Technology Based On Deep Learning

Posted on:2020-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:W P GuoFull Text:PDF
GTID:2428330596478131Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Voiceprint recognition technology is a kind of biometric technology.It recognizes the identity of a speaker according to the different vocal characteristics of each speaker,so it is also called speaker recognition technology.Speaker recognition is affected by environmental channel duration and other factors,traditional speaker recognition technology cannot well overcome the influence of these factors,resulting in poor recognition effect of the speaker and affecting the security and efficiency of identity authentication.In recent years,with the development of artificial intelligence,deep learning technology has become a hotspot problem in the field of speaker recognition.This paper mainly studies two commonly used deep learning methods in speaker recognition technology.Firstly,the application of CNN in speaker recognition technology is studied,and research the combination of cost function and activation function commonly used in convolutional neural network.Secondly,based on the traditional GMM-UBM model,the speaker recognition scheme of DNN-UBM model is proposed.Finally,the speaker recognition model based on deep learning is applied to the remote identity authentication system in combination with Android smartphone.The main research work of this paper is as follows.(1)Due to the complex network structure,the convolution neural network leads to a slow process of parameter optimization during the training process.In order to make the speaker network model converge quickly,this paper by using the method of probability theory to deduce the quadratic cost function and cross entropy cost function,which are commonly used in speaker recognition model training based on convolutional neural network,and gives them the combination effects with different activation functions,at the same time,the parameter optimization process is studied for the combination of the preferred cost function and activation function.Finally,a combination scheme that optimizes the performance of the speaker model is presented.(2)The traditional gmm-ubm model ignores the influence of speaker's speech content on speaker's speech signal,which leads to poor performance of the speaker recognition system,so a speaker recognition method based on dnn-ubm model is proposed.The method replaces the unsupervised UBM model with a supervised UBM model,and integrates the information contained in the speech content into the speaker statistics.The posterior probability is combined with the standard speaker features as the speaker feature used in the model.Whichcreates enough statistical information for the extraction of i-vector features.At the same time,the recognition performance and robustness of the model under the condition of sufficient and insufficient training corpus are studied,and the optimal recognition accuracy of the model under different hidden layers of DNN was explored.The experimental results show that the dnn-ubm model performs better than the traditional gmm-ubm model in speaker recognition task,and the system shows the best recognition effect when the number of DNN layers reaches six.(3)Speaker recognition technology relies on voice-transmitting media devices to deliver biometric information representing the identity of the speaker,so as to realize remote identity authentication with the rapid development of smart phones,it serves as a medium for speaker voice transmission,and the application of speaker recognition technology in the smartphone effectively solves the problem of identity authentication over a long distance.By the Kaldi framework,the speaker recognition model based on deep neural network is trained,and studies the framework and principle of Android technology integration.The trained speaker recognition models is transplanted into Android smartphone,the remote identity authentication system has achieved a better experience and recognition effect.
Keywords/Search Tags:Speaker recognition, Deep neural network, GMM-UBM, Kaldi framework, Android technology
PDF Full Text Request
Related items