Font Size: a A A

Research Of Speaker Identification Technology Based On Deep Features

Posted on:2020-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:T GuFull Text:PDF
GTID:2428330590995614Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech is the most convenient way of communication in human society.Due to the difference between congenital organs and acquired habits,each speaker has his own unique personality.Characteristic parameters and model framework are two important determinants of speaker recognition rate.Once the model framework has been determined,the performance of speaker system will mainly depend on the selection and extraction of feature parameters.As computer technology is developing rapidly nowadays,finding feature parameters with high representativeness and distinctiveness is a valuable subject.Extracting depth features which are different from traditional feature parameters is the research goal of this paper.The recognition rate and time complexity of speaker identification system with these depth features are studied.Firstly,a research on the model GMM and DBN is conducted,and the two models are then combined to propose the feature DGCS.Secondly,the CNN which has made great achievement in image recognition task is researched,and a CNN model is then designed to extract deep fusion feature.This paper's main content and innovation are summarized as:(1)The basic knowledge,including the principle of speaker recognition,the process of feature extraction and the recognition model,is provided.The extraction process of characteristic parameters MFCC and LPCC is firstly introduced.What follows is the introduction of model GMM,GMMUBM,SVM and deep neural network.According to previous studies,the performance of these models is relatively better in the speaker recognition system.Therefore,the research of this thesis is based on above models.(2)In order to fully mine the identity information of speaker,the feature DGCS is proposed with the study of DBN and GMM-SVM.The traditional Gaussian supervector is obtained on the condition that MFCC is the direct input of GMM.In this paper,the MFCC is firstly entered into a DBN to extract bottleneck feature,then the bottleneck feature is took as the input of a GMM to extract the DGS.For the reason that there is a certain correlation between the mean vectors of DGS,mean vectors are then reconstructed to build DGCS.More depth speaker identity information is carried by DGCS.In addition,DGCS is more in line with the superior performance of SVM in dealing with highdimensional small data.Experimental simulations show that DGCS cannot only improves the recognition rate effectively,but also reduces the SVM modeling time compared to the traditional Gaussian supervector,Gaussian correlation supervector and DGS.(3)Based on the superiority performance of fusion feature,a CNN fusion feature is constructed using convolutional neural network.In this paper,speaker utterances are first converted into spectrograms.Then spectrograms are used as the input of CNN to construct speaker recognition system.Research shows that the number of CNN network layers has an important impact on system performance.For the purpose of utilizing features from different layers,CNN features extracted from two different layers with better recognition rate are then fused.The experimental simulation shows that the speaker system with CNN fusion features has achieved good results in recognition rate.
Keywords/Search Tags:Speaker Identification, Deep Belief Network, Gaussian Mixture Model, Deep Gaussian Correlation Supervector, Convolutional Neural Network, Fusion Feature
PDF Full Text Request
Related items