Font Size: a A A

Research And Implementation Of Speaker Recognition Based On Deep Learning

Posted on:2020-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:N YangFull Text:PDF
GTID:2428330575453111Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of deep learning technology,the application of artificial intelligence is becoming more and more civilian.As one of the tools of human cognition of the world,sound has been fully researched and developed in the intelligent day.In recent years,with the popularity of smart mobile devices,more and more voice data has been collected,pushing people to do valuable things with the data.With the support of big data,traditional statistical methods are still used for speaker recognition,but there are some limitations.For example,in order to achieve better results,more precise feature extraction of complex data is needed,so it is urgent to develop a new and more effective method.Deep learning technology is naturally suitable for large amounts of data and has a mature application in the fields of computer vision and natural language processing.Therefore,this paper implements the speaker recognition algorithm based on deep learning technology to achieve the purpose of identifying the identity,age and gender of the speaker.The main work of this paper is:1)A closed set text independent speaker identification algorithm based on speech spectrogram is proposed.According to the basic requirement that the number of speakers to be recognized remains the same,this paper abstracts it into a classification problem,takes speech spectrogram as input feature,trains Convolutional Neural Network(CNN)as multi-classification discrimination model,and realizes the identification of the speaker's identity.Compared with the traditional Mel-frequency cepstral coefficient(MFCC)and Gaussian Mixture Model-Universal Background Model(GMM-UBM)based methods,this algorithm proves that the proposed algorithm has higher recognition accuracy and less computational delay on large public datasets.2)An open set text independent speaker identification algorithm based on identity coding is proposed.The difference between speaker identification in open set and closed set is studied.The problem that the number of speakers under open set is not fixed is based on the closed set text-independent speaker identification algorithm based on the spectrogram.A good multi-classification neural network is used as a feature extractor to identify different speakers for identity recognition.Compared with the traditional method,when the number of registered voices per person is small,the performance of the method is more stable and the recognition accuracy is higher.3)In order to meet the needs of speaker age and gender recognition,continue to use graph features and neural network methods.In the graph features,try spectrogram,Log-Mel Energies,MFCC,Constant-Q-Transform(CQT)and Harmonic Percussive Source Separation(HPSS),and add Recurrent Neural Network(RNN)to the model.A comparative experiment was performed on the non-public dataset,and combined with the time complexity of the algorithm operation,the better performance of the Log-Mel Energies was selected as the input feature,and the Http service was built to realize the age and gender recognition.This feature has been embedded in the Kings of Glory intelligent robot products sold by Tencent.
Keywords/Search Tags:Artificial Intelligence, Deep Learning, Speaker Recognition, Neural Network, Harmonic/Percussive Source Separation
PDF Full Text Request
Related items