Font Size: a A A

Research On Short-speech Speaker Verification Method Based On Multi-branch Aggregation Network

Posted on:2022-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q YangFull Text:PDF
GTID:2518306572960029Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Speaker verification is a technology to determine whether a certain speech comes from a given speaker.With the rapid development of the Internet and the widespread popularity of mobile devices,it has become easier to collect a person's voice data,which greatly facilitates and promotes the research of Speaker Verification technology.After decades of development,although the technology has made considerable progress,speaker confirmation under short speech conditions is difficult to extract sufficient speaker distinguishing information due to short data and few speaker identity information.In turn,it affects the scoring and discrimination of the model and the overall recognition effect of the system.Therefore,short-speech speaker confirmation is still a challenging task.Aiming at the short-speech speaker verification problem,the research content of this article mainly includes the following aspects:(1)A speaker embedding feature extraction method based on Multi-Branch Aggregation(MBA)network is proposed.In view of the fact that it is difficult to extract sufficient speaker identity information for a single-channel system,based on the Time-Delay Neural Network(TDNN),the Large?TDNN(L?TDNN)network which increases the number of nodes and delay value and the Small?TDNN(S?TDNN)network which reduces the number of nodes and delay value form a multibranch structure,extracting more the features of each channel are then used to aggregate the multi-branch results through the pooling layer and then using feature splicing.Experimental results show that this method achieves better performance than the baseline system in the test speech.(2)A speaker embedding feature extraction method based on Multi-Branch and Multi-Scale Aggregation(MBMSA)network is proposed.In view of the problem of information loss in the process of feature transmission from the lower layer of the network to the upper layer in each individual channel,the information lost during the transmission process needs to be retrieved,and the information of the lower layer network can be used as much as possible during each feature transmission.It is retained,so the multi-scale aggregation method that can achieve the above requirements is adopted in the multi-branch network to further improve the performance of the algorithm.The implementation of this method needs to reflect the diversification of scales between different network layers,so a multi-branch multiscale aggregation network is constructed using residual networks(ResNet)based on Convolutional Neural Networks(CNN).The experimental results show that the proposed multi-branch and multi-scale aggregation network can achieve better results on short-speech speaker verification problems.
Keywords/Search Tags:Speaker Verification, Short Speech, TDNN, MBA, MBMSA
PDF Full Text Request
Related items