Font Size: a A A

Research And Implementation Of Multi Speaker Recognition Technology Based On Deep Learning

Posted on:2022-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:X T SiFull Text:PDF
GTID:2518306575964319Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
With the speedy development of the artificial intelligence era,more and more service robots play an important role in people's production and life.Speaker recognition technology is an important part of the area of man-machine communication.However,the speaker recognition model of the single person can't meet the complex scene of the multi person dialogue in practical application.In order to make people interact with the service robot more intelligently,this thesis uses the deep learning algorithm to delve and improve the deficiencies of the system,realizes a multi speaker recognition system based on speech,and verifies the effectiveness and feasibility of the system in the real environment.First of all,the Mel frequency cepstral coefficient(MFCC)is used as the speaker identity feature in the feature extraction part of common algorithms,which will cause the loss of some high-frequency information,resulting in low recognition rate and poor robustness of the system.In this thesis,a three-dimensional data structure speaker feature based on Mel frequency enhanced spectra(MFEC)is proposed.The feature cancels the discrete cosine transform(DCT)in MFCC transform and fuses the gamma frequency cepstral coefficient(GFCC)to get a more robust speaker feature.Experimental results show that the proposed 3D speaker feature can reduce the recognition error rate of the system.Secondly,this thesis takes convolutional neural network(CNN)as the research object of speaker acoustics model.With the increase of network layers and the parameter setting of fixed convolution calculation,the model will ignore the learning information of shallow network and the training time is too long.In this thesis,an improved 3D convolution neural network joint long short term memory(LSTM)model is proposed.The speaker features of the 3D data structure are used as input,and the improved acoustic model can extract the deep feature information of the speaker and enhance the learning of the context content of the speaker's speech.The experimental results show that3DCNN-LSTM can effectively reduce the recognition error rate of the system,and the robust performance is better under different speech duration.Finally,the integrated implementation of multi speaker recognition system is completed on the intelligent wheelchair,and the configuration environment is transplanted to the Jetson Nano development board of the intelligent wheelchair for experimental testing.The test results show that the 3D speaker features and the 3DCNNLSTM acoustic model proposed in this thesis are effective on the intelligent wheelchair.
Keywords/Search Tags:deep learning, multi speaker recognition, 3D speaker feature, 3DCNN-LSTM
PDF Full Text Request
Related items