Font Size: a A A

Experimental Analysis And Improvement Of Speaker Recognition System

Posted on:2021-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2518306548481694Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The thesis aims to improve the performance of speaker recognition system,implements and tests a variety of system solutions.The feasibility of improving the performance of speaker recognition is discussed from three aspects: input features,network architecture and model extension.(1)Solve the validity of input features.In the speaker recognition system,Mel Frequency Cepstrum Coefficient(MFCC)is usually used as the input feature,but MFCC will lead to over compression of voice information.This thesis replaces MFCC with the most primitive acoustic parameter,the Spectrogram feature,which contains more information about the speech signal itself.The experimental results show that the x-vector system constructed based on the Spectrogram feature can achieve better recognition results than MFCC;(2)Solve the distinguishing problem of speaker embedding features.In this thesis,SE-ResNet and RACNN are constructed by optimizing the network architecture of ResNet to obtain more differentiated speaker embedding features.SE-ResNet in the field of image recognition can better learn the dependencies between the channels of convolutional features.This thesis applies it to speaker recognition and achieves better recognition effect than ResNet.The RACNN with the original waveform as the input feature in the audio field is improved:The residual block is used to replace the convolution layer,and LRe LU is used to replace Re LU.On the basis of softmax loss,two more discriminative loss functions are added,and the delay neural network layer in the original model is removed to achieve better recognition performance with less parameters;(3)Solve the efficiency problem of deep neural networks.If only expand the depth,width,or speech length of a model,the recognition accuracy will quickly reach saturation.In this thesis,EfficientNet in the field of image recognition is used for speaker recognition.First,a baseline model is designed,and then the system performance is improved by compounding the extended model while ensuring that the baseline model parameters are unchanged.At the same time,the recognition efficiency is guaranteed.The experimental result shows that the recognition performance of compound extended EfficientNet is better than that of single dimension extended.
Keywords/Search Tags:Speaker Recognition, Deep Neural Network, Spectrogram Features, Original Waveform, Model Extension
PDF Full Text Request
Related items