Experimental Analysis And Improvement Of Speaker Recognition System

Posted on:2021-11-12

Degree:Master

Type:Thesis

Country:China

Candidate:L Li

Full Text:PDF

GTID:2518306548481694

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

The thesis aims to improve the performance of speaker recognition system,implements and tests a variety of system solutions.The feasibility of improving the performance of speaker recognition is discussed from three aspects: input features,network architecture and model extension.(1)Solve the validity of input features.In the speaker recognition system,Mel Frequency Cepstrum Coefficient(MFCC)is usually used as the input feature,but MFCC will lead to over compression of voice information.This thesis replaces MFCC with the most primitive acoustic parameter,the Spectrogram feature,which contains more information about the speech signal itself.The experimental results show that the x-vector system constructed based on the Spectrogram feature can achieve better recognition results than MFCC;(2)Solve the distinguishing problem of speaker embedding features.In this thesis,SE-ResNet and RACNN are constructed by optimizing the network architecture of ResNet to obtain more differentiated speaker embedding features.SE-ResNet in the field of image recognition can better learn the dependencies between the channels of convolutional features.This thesis applies it to speaker recognition and achieves better recognition effect than ResNet.The RACNN with the original waveform as the input feature in the audio field is improved:The residual block is used to replace the convolution layer,and LRe LU is used to replace Re LU.On the basis of softmax loss,two more discriminative loss functions are added,and the delay neural network layer in the original model is removed to achieve better recognition performance with less parameters;(3)Solve the efficiency problem of deep neural networks.If only expand the depth,width,or speech length of a model,the recognition accuracy will quickly reach saturation.In this thesis,EfficientNet in the field of image recognition is used for speaker recognition.First,a baseline model is designed,and then the system performance is improved by compounding the extended model while ensuring that the baseline model parameters are unchanged.At the same time,the recognition efficiency is guaranteed.The experimental result shows that the recognition performance of compound extended EfficientNet is better than that of single dimension extended.

Keywords/Search Tags:

Speaker Recognition, Deep Neural Network, Spectrogram Features, Original Waveform, Model Extension

PDF Full Text Request

Related items

1	Pulse Coupled Neural Network (pcnn) In The Spectrogram-based Speaker Recognition
2	Application Research Of Spectrogram On Pronunciation Recognition Of Chinese Characters And Speaker Recognition
3	Research On Identity Recognition Algorithm Based On Speech Features
4	Research On End-to-end Speaker Recognition Based On Raw Waveform
5	Analysis Of Effective Fused Features And Model Evaluation For Speech Emotion Recognition
6	Research Of Speaker Recognition Technology Based On Fusion Features
7	Application Of Deep Recurrent Neural Networks In Speaker Recognition On Mobile Phones
8	Text-independent Speaker Recognition Research Based On Local Acoustic Features
9	Research On Speaker Adaptation Of Neural Network Acoustic Models For Speech Recognition
10	Research On Speaker Identification Based On Speech Processing