Font Size: a A A

Speaker Recognition Method Based On Deep Learning

Posted on:2022-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q M MaFull Text:PDF
GTID:2518306752465554Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
As informatization progresses in society,the role of digital speech in public security has become increasingly prominent.Processing massive audio data by traditional manual methods no longer meets the requirements of Smart Policing,and it has become a trend to process speech information with speaker recognition.The key speeches collected in the practical police work often have background noise and the duration is not fixed.In the face of short-term speech recognition tasks,the performance of traditional methods are insufficient.The speaker recognition methods proposed in this paper can not only improve the precision,but also provide the clues for investigation.This paper starts from two aspects of feature parameters and deeplearning model respectively.Speaker features based on contribution weighted MFCC and feature fusion and the SE-Res2 Net speaker recognition model have been proposed,which have achieved good results.The contributions of this paper are as follows:Firstly,a speaker feature based on the fusion of contribution weighted MFCC and deep bottleneck feature(DBNF)is proposed to address the limitations of MFCC.In the feature weighting part,the contribution of features to recognition is evaluated by increment-subtraction component method,and then the results are fitted by Fourier series to obtain the feature weight of each dimension of MFCC.Finally,the weighted MFCC and DBNF features are combined to highlight the key information in the speech signal and enhance the robustness of the features.Secondly,in view of the problem that traditional speaker recognition algorithms are difficult to extract voiceprint features in text-independent and short-term speech scenarios,a speaker recognition model based on SE-Res2 Net is proposed.The principal body of the model consists of fully connected Res2 Net module and SE(Squeeze-and-Excitation)module.A hierarchical connection structure is used within the model to integrate the information of different channels and enlarge the receptive field of the model.In addition,the model also combines self-attention pooling to further emphasize the key feature information in speech,thereby solving the problem of insufficient information in short-term speech recognition tasks.Experimental results show that the SE-Res2 Net model outperforms existing methods in textindependent speaker recognition tasks,and also has excellent performance in short-term speech recognition scenarios.Finally,a speaker recognition system based on the SE-Res2 Net is implemented.According to the police practical application,the system has designed several functions correspondingly,thus it has certain application value.
Keywords/Search Tags:Speaker recognition, MFCC, Feature fusion, Attention mechanism
PDF Full Text Request
Related items