Research On Deep Learning Models And Algorithms For Speaker Recognition

Posted on:2017-04-14

Degree:Doctor

Type:Dissertation

Institution:University

Candidate:HAZRAT ALI

Full Text:PDF

GTID:1108330482972319

Subject:Communication and Information System

Abstract/Summary:

Human speech consists of three kinds of information which are linguistic information, emotional state information and speaker-specific information. The speaker-specific in-formation is the key to speaker recognition tasks, if this information is extracted and utilized properly.The underlying challenge in speaker recognition task is directly related to efficient learn-ing of features from speech data. Until recently, the traditional hand crafted features like Mel Frequency Cepstral Coefficients (MFCCs) have been popular for processing speech and/or audio data. With the development in deep learning technology, research has shifted to unsupervised features learning from audio data. The deep learning tech-nology has shown tremendous improvements in performance on machine learning tasks such as object recognition, face recognition, handwritten character recognition, machine translation, etc.In this thesis, we present our work on the use of deep learning techniques for learning of features from audio data for speaker recognition. In particular, we explore the use of Restricted Boltzmann Machines and Deep Belief Networks for unsupervised features learning. We also propose and discuss a deep hybrid features model combining the unsupervised learned features with the traditional Mel Frequency Cesptral Coefficients. We report the evaluation of these hybrid features on speaker recognition task. Our experimental results show that the deep hybrid features give better recognition accuracy on the speaker recognition task.We also discuss new approach for audio data transformation and training a standard Restricted Boltzmann Machine with the transformed data. We refer this to be the convolutional data.Furthermore, we present a simple late fusion approach for the i-vector paradigm. The i-vectors are recently discovered features with great potential on speaker recognition task. We show our results on the i-vector data from the NIST i-vector challenge. The results achieved with the late fusion approach outperform the baseline score.

Keywords/Search Tags:

Audio Data Classification, Deep Learning, i-uector, Restricted Boltzmann Machine, Speaker Recognition, Support Vector Machines

Related items

1	Reseacrh On Image Recognition Based On Deep Learning Algorithm
2	Research On Deep Learning Based Speaker Recognition Modeling
3	Image Classification Method Based On Abandoned Stacked Restricted Boltzmann Machine
4	Application Of Deep Learning And Supervector In Speaker Recognition
5	The Research And Development Of Gesture Recognition Based On Machine Learning
6	Research Of Deep Learning Method Based On Restricted Boltzmann Machines
7	Research On Training Algorithm Of Restricted Boltzmann Machine Under Classification Rate Criterion
8	Application Of Improved GRBM In Speech Recognition
9	Adaptive Cardinality Restricted Boltzmann Machines
10	Research On Terrain Classification For Robots Based On Restricted Boltzmann Machine