Research On Feature Learning In Speaker Recognition

Posted on:2019-06-02

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L T Li

Full Text:PDF

GTID:1368330590451478

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Speaker recognition(SRE),an important biometric recognition technology,is the process of automatically identifying or verifying the identity of a person from his/her voice.After decades of research,SRE has gained great performance improvement,and has been deployed in a wide range of applications.However,the present SRE approaches are far from reliable,especially in unconstrained conditions that are full of unpredictable uncertainties,e.g.,free text,multiple channels,environmental noises,speaking styles.An intuitive idea to address these uncertainties is to discover features that are sensitive to speaker traits but robust against other uncertainties.Therefore,this dissertation focuses on deep feature learning in speaker recognition.The major contributions of this dissertation are as follows:1.A convolutional time-delay deep neural network for speaker feature learning.From the properties of speech signal,considering the representation of speaker traits and the trainability of model design,a convolutional time-delay deep neural network(CTDNN)which consists of a convolutional component and a time-delay component was built to learn deep speaker features.By means of qualitative and quantitative analysis,it demonstrated that the learned features are strong discriminative for speakers.2.Research on the generalizability of deep speaker features.The training objective of speaker feature learning is to discriminate among different speakers rather than directly for speaker recognition task.Therefore,several schemes were made from different perspectives to verify the effectiveness of deep speaker features and prove the generalizability of feature learning approach.3.Full-info training for speaker feature learning.Considering the training objective of speaker feature learning only focuses on maximizing the inter-speaker variation while neglecting the constraints of within-speaker variation,there exists within-speaker divergences in deep speaker features.Therefore,a full-info training approach based on centroid-converge criterion was proposed.On the premise of maximizing the inter-speaker variation,a within-speaker constrain was injected in the training process to improve the cohesiveness of deep speaker features.4.Phone-aware training for speaker feature learning.Considering the training process of speaker feature learning completely depends on the complex model structure and a large amount of training data,this �blind' data-driven learning is highly susceptible to other non-speaker factors,especially the phonetic content.Therefore,inspired by the success of conditional learning,a phone-aware training approach based on phoneticcompensation criterion was proposed.The phonetic information of each frame was informed in the training process.By this phonetic compensation,the within-speaker variation caused by phonetic content can be largely explained away,and the quality of the learned features was improved.

Keywords/Search Tags:

speaker recognition, feature learning, deep learning

PDF Full Text Request

Related items

1	Research On Speaker Recognition Based On Discriminative Feature Learning
2	Research Of Robust Speaker Recognition In Deep Learning Framework
3	Research And Implementation Of Multi Speaker Recognition Technology Based On Deep Learning
4	Research On Speaker Recognition Based On SVM And Deep Learning
5	Research On Key Technologies Of Speaker Recognition Based On Deep Learning
6	Research On The Speaker Recognition System Under The Short Utterance Based On Deep Learning Theory
7	Research On Deep Learning Based Speaker Recognition Modeling
8	The Application Of Speaker Recognition Technology Based On Deep Learning
9	Research On Speaker Gender Feature Recognition Based On Deep Learning
10	Research On Speaker Recognition Algorithm Based On Deep Neural Network