Font Size: a A A

Research On Speaker Verification Model Based On Information Enhancement

Posted on:2024-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ChenFull Text:PDF
GTID:2568307115464094Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the passage of time and the continuous innovation of technology,biometric technology has achieved remarkable development and achievements.Among them,speaker recognition technology has garnered significant attention because of its advantages such as non-contact nature and cost-effectiveness,which has attracted extensive attention and in-depth research by researchers.Speaker recognition technology has made significant progress,but existing network models still have many limitations in information utilization.This limits the further development of the technology.Therefore,there is an urgent need to develop new network models that can make better use of information.This paper will address the following two issues from the perspective of information enhancement:(1)Residual neural networks show strong ability in capturing local frequency information,but it is difficult to model global frequency information because it is a network model based on convolution operations?(2)ECAPA-TDNN network uses the squeeze and excitation module to reduce the influence of noise on the model,but too much reliance on squeeze and excitation module output features may lead to insufficient modeling of identity information.Based on this,this paper will conduct research around the above-mentioned issues,so as to enhance the ability to use information,the specific work is as follows:1.Aiming at the problem that the residual neural network model is difficult to use global frequency features,a speaker verification model based on local global frequency information coupling is proposed.Global frequency information,such as the speaker’s pitch,has a significant effect on the performance improvement of the speaker verification task.Aiming at the lack of ability of residual neural network to model global frequency information,this paper proposes a network model that couples global and local frequency information.The network model consists of a global branch and a local branch.The global branch uses a multi-head attention mechanism to capture frequency global information,and the local branch uses identity function or shift operations to obtain frequency local information.Experimental results indicate that the proposed method surpasses the comparative methods on three test datasets,Vox Celeb1-O,Vox Celeb1-E and Vox Celeb1-H.In addition,this paper also investigates the impact of speech length on model performance,and demonstrates that the proposed fusion model remains effective across different speech lengths.2.Aiming at insufficient modeling ability due to excessive reliance on the output features of the compression-excitation module,a speaker verification model based on multi-view feature fusion is proposed.The model mainly consists of multi-view feature extraction and feature fusion modules.Among them,the multi-view feature extraction module uses SELayer input features and output features as two view features,and at the same time reduces the redundant information of the two view features through the feature interaction module? The feature fusion module fully fuses the two view information to generate a distinctive speaker representation of competence.Experimental results on multiple indicators of the test set and visualization graphs indicate that the proposed model exhibits noteworthy advantages over the single network model.
Keywords/Search Tags:Speaker verification, Speaker embeddings, Information interactions, Information fusion, Attention mechanism
PDF Full Text Request
Related items