Speaker Recognition Based On Multi-information Fusion

Posted on:2019-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:X Fang

Full Text:PDF

GTID:2428330542992462

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

Speaker recognition,also named voiceprint recognition,is a biometric identification technology to identify speaker automatically according to the speaker's voice.In essence,it is a process of classification based on the speaker's feature.Therefore,this paper aims at extracting a more comprehensive feature that characterizing the speaker's information to further improving the performance of the speaker recognition system.The following are the main contents of this paper:1.Three traditional speaker recognition systems are built.According to the difference of training models,it can be divided into as the following: Speaker recognition based on TVM.That is,large-scale data was used to train universal background model(UBM),then calculating statistics of the subspace data and training TVM based on the frame posteriori probability.This system is named TVM-I-Vector.Deep neural network(DNN)based speaker recognition system.That is,using the DNN to compute the posterior of the frames with respect to each of the classes in the model by replacing the universal background model(UBM)in the TVM-I-Vector.This system is abbreviated as NN-I-Vector.Using the deep bottleneck feature(DBF)to replace such acoustic features as MFCC as the input of a speaker recognition system.This system is denoted as DBF-I-Vector.Because i-vectors are extracted without distinguishing speaker information from channel information of input utterances,LDA or PLDA is applied to reduce the influence of the channel on the recognition performance.2.Speaker recognition system based on feature fusion is constructed.The input feature of speaker recognition can be deep feature,such as DBF,and shallow feature,such as MFCC,PLP.The shallow feature is a low-level feature and is extracted from the short-time spectrum information,and is difficult to represent the high-level information of the input speech.deep features used in speaker recognition system take the phoneme discriminative information into consideration,but do not involve the intuitive physical layer acoustic features.According to the advantages anddisadvantages of the deep and shallow features,feature fusion is applied to achieve complementary advantages between the features and improve the performance of speaker recognition system.3.The speaker recognition system based on I-Vector model fusion is implemented.Different types of speaker recognition system,such as TVM-I-Vector,NN-I-Vector,has some differences in performance,but also has its own advantages.And the differences finally accumulated on the extracted feature vectors named i-vectors.Thus models fusion is proposed for speaker recognition system to explore the advantages of different speaker recognition systems and improve the performance of speaker recognition system.4.End-to-end speaker recognition system is built.About speaker recognition,end-to-end,its basic idea is to use the speaker embedding extracted from deep neural network replace the i-vector.To be specific,using the acoustic features as input feature,and then extracting fixed-length feature vectors named speaker embedding from out of the statistics pooling layer.Finally at the back end of the system,PLDA and cosine similarity are used to score between different i-vectors.This paper is the design and optimization of the end to end speaker recognition system under the guidance of this idea.This idea not only simplifies the training complexity of the system,but also adds discriminative information for speaker recognition system.

Keywords/Search Tags:

speaker recognition, i-vector, deep neural network, model fusion, end-to-end

PDF Full Text Request

Related items

1	The Research Of The Speaker Recognition System Using Low-Dimensional Vector Representations
2	Research Of Speaker Recognition Technology Based On Fusion Features
3	Speaker Adaptation Of DNN-HMM Acoustic Model For Speech Recognition
4	Research On Robustness Of Speaker Recognition In Noisy Environment
5	Rearch On Text-independent Speaker Identification Technology Based On SVM
6	Study On The Deception Detection Method Identified By The Automatic Speaker Verification System
7	Research On Speaker Adaptation Of Neural Network Acoustic Models For Speech Recognition
8	Research On Speaker Recognition Algorithm Based On Deep Neural Network
9	Speaker Recognition Based On Fusion Of RBPF And DNN
10	Research Of The Pattern Matching Method In Speaker Recognition System