Research And Implementation Of Multi Speaker Recognition Technology Based On Deep Learning

Posted on:2022-06-30

Degree:Master

Type:Thesis

Country:China

Candidate:X T Si

Full Text:PDF

GTID:2518306575964319

Subject:IC Engineering

Abstract/Summary:

PDF Full Text Request

With the speedy development of the artificial intelligence era,more and more service robots play an important role in people's production and life.Speaker recognition technology is an important part of the area of man-machine communication.However,the speaker recognition model of the single person can't meet the complex scene of the multi person dialogue in practical application.In order to make people interact with the service robot more intelligently,this thesis uses the deep learning algorithm to delve and improve the deficiencies of the system,realizes a multi speaker recognition system based on speech,and verifies the effectiveness and feasibility of the system in the real environment.First of all,the Mel frequency cepstral coefficient(MFCC)is used as the speaker identity feature in the feature extraction part of common algorithms,which will cause the loss of some high-frequency information,resulting in low recognition rate and poor robustness of the system.In this thesis,a three-dimensional data structure speaker feature based on Mel frequency enhanced spectra(MFEC)is proposed.The feature cancels the discrete cosine transform(DCT)in MFCC transform and fuses the gamma frequency cepstral coefficient(GFCC)to get a more robust speaker feature.Experimental results show that the proposed 3D speaker feature can reduce the recognition error rate of the system.Secondly,this thesis takes convolutional neural network(CNN)as the research object of speaker acoustics model.With the increase of network layers and the parameter setting of fixed convolution calculation,the model will ignore the learning information of shallow network and the training time is too long.In this thesis,an improved 3D convolution neural network joint long short term memory(LSTM)model is proposed.The speaker features of the 3D data structure are used as input,and the improved acoustic model can extract the deep feature information of the speaker and enhance the learning of the context content of the speaker's speech.The experimental results show that3DCNN-LSTM can effectively reduce the recognition error rate of the system,and the robust performance is better under different speech duration.Finally,the integrated implementation of multi speaker recognition system is completed on the intelligent wheelchair,and the configuration environment is transplanted to the Jetson Nano development board of the intelligent wheelchair for experimental testing.The test results show that the 3D speaker features and the 3DCNNLSTM acoustic model proposed in this thesis are effective on the intelligent wheelchair.

Keywords/Search Tags:

deep learning, multi speaker recognition, 3D speaker feature, 3DCNN-LSTM

PDF Full Text Request

Related items

1	Research On Key Technologies Of Speaker Recognition Based On Deep Learning
2	Research On Improvement Of Speaker Recognition Algorithms Based On Hand-held Device
3	Text Independent Speaker Recognition Based On Deep Learning Framework
4	The Application Of Speaker Recognition Technology Based On Deep Learning
5	Research And Application Of Speaker Recognition Based On Deep Learning
6	Research Of Robust Speaker Recognition In Deep Learning Framework
7	Research On Speaker Recognition Algorithm Based On Deep Neural Network
8	Research On Speaker Recognition Based On SVM And Deep Learning
9	Speaker Recognition Research Based On GMM Speaker Clustering Technology
10	Speaker Recognition Method Based On Deep Learning