Research On Improvement Of Speaker Recognition Method Based On RSCNN

Posted on:2020-10-09

Degree:Master

Type:Thesis

Country:China

Candidate:C Dai

Full Text:PDF

GTID:2428330572482437

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

Speaker recognition is a kind of biometrics,which distinguishes between speakers by extracting features that can represent speaker identity from raw speech,and it has advantages of collection conveniences and high acceptability to users compared to other biometrics,thus it is widely used nowadays.With the advent of Internet and big data age,deep learning gradually dominates speaker recognition field by virtue of its good representation ability compared to traditional shallow model.In this paper,we mainly focus on a kind of convolutional neural network model based on raw speech input(RSCNN)and its application on speaker recognition.The RSCNN model can directly learn appropriate raw speaker features from speech data,and it is more independent of specific priori knowledge than deep learning architecture which is based on spectrum feature input.Our work is based on related research work on RSCNN,and proposes some improvement projects to it.First,in view of the deficiencies of model fusion method of which the computational cost is relatively high,we propose a feature fusion method,which fuses two kinds of features extracted by two convolution kernels in the first convolution layer of RSCNN,in which the two convolution kernels have dififrernt width.We then compare the feature fusion method with model fusion method on precision and training time by conducting several contrast experiments.The experimental results on three public datasets indicate that the proposed feature fusion method has little difference in precision compared to model fusion method,but it can observably shorten the training time,which shows its effectiveness.Second,we try to use additional two scales of features based on aforementioned two-scale feature fusion method,and use 4 different kernel widths to extract 4 kinds of features which have different scales in parallel and fuse them.The experimental results indicate that within certain limits the recognition performance of model will be better when the number of feature scales increases.Finally,we design the model transfer experiments,in which we transfer the models trained on three public datasets to self-built dataset and finetune them.The experimental results prove that the transferred RSCNN model can extract speaker features that have a certain extent of invariance on new datasets,and the trained feature extraction module will have better generalization performance to new datasets when the example diversity of the original dataset enhances,meanwhile the proposed feature fusion method will have more promotion on generalization performance of transferred model.

Keywords/Search Tags:

Speaker Recognition, Deep Learning, Feature Fusion

PDF Full Text Request

Related items

1	Speaker Recognition Method Based On Deep Learning
2	Research On Key Algorithms Of Speaker Recognition Based On Deep Learning
3	Research On Bi-mode Biometrics Based On Deep Learning
4	Research On Deep Audio-Face Feature Fusion For Speaker Recognition And Annotation
5	Research And Application Of Speaker Recognition Based On Deep Learning
6	Research Of Speaker Recognition Technology Based On Fusion Features
7	Research And Implementation Of Multi Speaker Recognition Technology Based On Deep Learning
8	Research Of Robust Speaker Recognition In Deep Learning Framework
9	Research On Speaker Recognition Based On SVM And Deep Learning
10	Research On Key Technologies Of Speaker Recognition Based On Deep Learning