Research On Machine Learning Based Speaker Recognition

Posted on:2022-02-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Mo

Full Text:PDF

GTID:2518306494950859

Subject:Electrical engineering

Abstract/Summary:

PDF Full Text Request

Biometric recognition technology has been widely employed in nowadays society because of its convenience and safety.As an important biological feature,the human voice contains abundant information.Besides,due to the widespread use of smart devices,the collection of speaker voice requires very low cost.Therefore,the analysis of speech voice is of high practical value.In this work,both the speaker recognition and speech emotion recognition with deep learning method are discussed.The speaker recognition is divided into speaker identification and speaker verification,while the speech emotion recognition is directly treated as a multi-class classification task.To take advantage of different attention mechanisms,a dual-path attention mechanism is proposed in this paper,combining the self-attention and convolutional block attention mod-ule.With the proposed method,the performance is significantly improved with neglectable extra time burden.Based on the Cluster-Range Loss(CRL)which is an improved version of Triplet Loss,a Weighted Cluster-Range Loss(WCRL)is presented in this work to improve the performance of CRL in speaker identification task.The WCRL focuses more on the increase of inter-class difference,and leads to a higher classification accuracy of critical samples.To address the problem of low efficiency of CRL in the initial training stage,a novel Criticality-Enhancement Loss(CEL)is also proposed.The CEL pays attention to the most easily and necessarily optimized samples.Combined with CRL,both the hardest and the easiest samples are considered concurrently per step.Therefore,the training process is hugely speeded up,and relatively more time is obtained for CRL.As a result,better performance is achieved.For speaker identification task,a Top-1 accuracy of 92.0% on VoxCeleb1 dataset and84.3% on CNCeleb dataset were reached.For speaker verification task,when trained on the VoxCeleb1 dataset,an equal erro rate(EER)of 5.1% was achieved.When trained on the Vox-Celeb2 dataset,the EER was further reduced to 3.52%.Compared with the baseline methods,the approaches proposed in this work show obvious superiority.As for speech emotion recognition,a light-weight architecture combining Res Net and GRU is proposed in this paper.Compared with the methods from other researchers,competitive per-formance on IEMOCAP dataset was reached with less parameters and features using the pro-posed architecture.An unweighted accuracy(UA)of 67.9%,F1-score of 0.675 were achieved,and the parameter amount was relatively reduced by 16.2%.

Keywords/Search Tags:

Speaker Recognition, Emotion Recognition, Attention, Loss

PDF Full Text Request

Related items

1	Triplet Loss And Manifold Dimensionality Reduction Based Method For Text-independent Speaker Recognition
2	Research On Speech Emotion Recognition Algorithm Based On Deep Learning
3	Research And Implementation Of Speech Emotion Recognition In Home Environment
4	Research On Deep Learning Based Speaker Recognition Algorithm
5	End-to-End Speaker Embedding For Speaker Recognition In The Wild
6	Research On Multi-modal Emotion Recognition Method Combining Speech And Expression
7	Research On Speech Emotion Recognition Methods
8	Research On Speech Emotion Recognition Based On Deep Learning
9	Research On Speaker Recognition Algorithm Based On Deep Convolutional Neural Network
10	Studies On Speaker Recognition Based On SVM And GMM