Content-independent Speaker Verification Modeland Its Application

Posted on:2020-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:J X Lu

Full Text:PDF

GTID:2428330590973918

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Presently the “Artificial Intelligence” comes to a new stage of development recently because of the arise of deep neural network.The technology of deep learning has the capacity that it could extract vary of information from relative raw data,it is the reason that it has been used in many fields and promotes the realization of application of artificial intelligence.The application in the field of speech signal processing is dramatically promoted by deep learning.There is vary of information in speech signal data such as linguistic information and para-linguistic but deep neural network could auto-extract the target information that task required and remove the other information as much as it could,it is why deep neural network largely promoted the development of speech recognition.The deep neural network also develops in the field of speaker verification,however,currently the speaker verification that based in deep learning is mainly in the content-dependent field,in such condition,the process of verification uses both linguistic information and para-linguistic information while it also has demand in the duration of utterance.In the scene of real-application,the utterance is not so long,so this topic proposes models that based on the short utterance.The utterance needs pre-processing since there is noise in utterance.The utterance would be transferred into raw speech feature after pre-processing.The information of speaker's personal identification is not time-sequence information so we could use convolutional neural network to extract feature vector in speaker verification model though speech signal data itself is time-sequence data.Gated recurrent unit module is also used to be compared with convolutional neural network.Multi-task verification model is proposed to extract feature vector from utterance that is more identifiable.Triplet loss with its improved version and cross entropy is used in multi-task verification model.It is proved by experiment that the multi-task verification model is better than single-task verification model and the improved triplet loss is also better than the old version.The end-to-end multi-task verification model is also promoted and there is new discover that,in such model,the mid-layer in neural network is the best in the task of verification.A speaker-recognition model in open set is also built based in the multi-task verification model.Such model is also used in the task of text matching,the experiment proved that the multi-task model is better than single-task model.

Keywords/Search Tags:

Triplet Loss, speaker verification, feature vector extraction, neural network, text matching

PDF Full Text Request

Related items

1	Triplet Loss And Manifold Dimensionality Reduction Based Method For Text-independent Speaker Recognition
2	Speaker Recognition Algorithm Based On Deep Learning
3	Research On Text-Independent Speaker Recognition
4	Text-Dependent Speaker Verification System
5	SVM Speaker Verification Based On Prosodic Feature
6	Research On Text-independent Multi-speaker Verification
7	Research On Speaker Recognition Algorithm Based On Deep Convolutional Neural Network
8	The Research On Channel Robustness Of Text-independent Speaker Verification
9	Text-Independent Speaker Verification Based On GMM And High-Level Information
10	Any Text Speaker Recognition System