Font Size: a A A

Research On Speaker Recognition Technology Based On Speech Enhancement

Posted on:2021-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2518306230978259Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition technology first appeared in 1930 s.After nearly a century of technological innovation,researchers still have many problems to be solved.The background noise is one of the important reasons for the low speaker recognition rate.In this paper,speech enhancement technology is applied to speaker recognition in order to improve the accuracy of speaker recognition in noisy environment.Firstly,a speaker recognition model based on speech enhancement is proposed,which is divided into two stages to complete the task of speaker recognition.In the first stage,noise is separated by speech enhancement technology,and in the second stage,neural network is used to recognize the processed speech.Experiments show that the accuracy of the speaker recognition model with speech enhancement algorithm is higher than that of the original model.In the speech enhancement stage,this paper proposes a speech enhancement model SA-Unet(Self-Attention U-Net),which is different from the traditional speech enhancement model.SA-Unet uses image semantic segmentation network U-Net as the basic network of the model,and adds self-attention mechanism to it for speech separation task.The self-attention mechanism enhances the perception of the model context,and thus more accurately separates the noisy part of the speech.Experiments show that compared with the traditional model,the SA-Unet enhanced speech has higher speech quality evaluation score.SA-Unet filters the noise in the input speech in the first stage and reduces the computational complexity of the recognition model in the next stage.In the stage of speaker recognition,this paper proposes a speaker recognition model SEResCNN(SEnet with ResCNN),which embeds ResCNN network in SEnet,and uses triplet loss function based on cosine similarity for training.This design can enhance the meaningful speech features of the speaker,and will not lead to the problem of network over fitting with the increase of network layers.Experimental results show that SEResCNN is superior to other speaker recognition models.
Keywords/Search Tags:Speaker Recognition, Speech Enhancement, Neural Networks
PDF Full Text Request
Related items