Font Size: a A A

Content-independent Speaker Verification Modeland Its Application

Posted on:2020-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:J X LuFull Text:PDF
GTID:2428330590973918Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Presently the “Artificial Intelligence” comes to a new stage of development recently because of the arise of deep neural network.The technology of deep learning has the capacity that it could extract vary of information from relative raw data,it is the reason that it has been used in many fields and promotes the realization of application of artificial intelligence.The application in the field of speech signal processing is dramatically promoted by deep learning.There is vary of information in speech signal data such as linguistic information and para-linguistic but deep neural network could auto-extract the target information that task required and remove the other information as much as it could,it is why deep neural network largely promoted the development of speech recognition.The deep neural network also develops in the field of speaker verification,however,currently the speaker verification that based in deep learning is mainly in the content-dependent field,in such condition,the process of verification uses both linguistic information and para-linguistic information while it also has demand in the duration of utterance.In the scene of real-application,the utterance is not so long,so this topic proposes models that based on the short utterance.The utterance needs pre-processing since there is noise in utterance.The utterance would be transferred into raw speech feature after pre-processing.The information of speaker's personal identification is not time-sequence information so we could use convolutional neural network to extract feature vector in speaker verification model though speech signal data itself is time-sequence data.Gated recurrent unit module is also used to be compared with convolutional neural network.Multi-task verification model is proposed to extract feature vector from utterance that is more identifiable.Triplet loss with its improved version and cross entropy is used in multi-task verification model.It is proved by experiment that the multi-task verification model is better than single-task verification model and the improved triplet loss is also better than the old version.The end-to-end multi-task verification model is also promoted and there is new discover that,in such model,the mid-layer in neural network is the best in the task of verification.A speaker-recognition model in open set is also built based in the multi-task verification model.Such model is also used in the task of text matching,the experiment proved that the multi-task model is better than single-task model.
Keywords/Search Tags:Triplet Loss, speaker verification, feature vector extraction, neural network, text matching
PDF Full Text Request
Related items