| Speaker verification,as a biometric feature technology,has been widely used in information security,public security and judicial,banking and securities industries,among others.Its convenience,non-contact nature,and relatively weak privacy make it a unique competitive advantage.In speaker verification,the speaker verification task is the most important component and has been the focus of research.Due to its superior performance and inference speed,r-vector is often selected as the backbone network of speaker verification systems.However,in practical applications,the limited representation extraction ability of its network structure and complex noise environment can result in poor performance.To address these issues,this paper proposes a frequency-channel selection attention mechanism and a speaker verification loss function based on data uncertainty learning.The main innovations are as follows:(1)Lightweight frequency-channel selection attention mechanism: This mechanism addresses the problem of limited representation extraction ability in r-vector due to the mixing of channel features with the time-frequency relationship learned by the convolution kernel.By using an attention module to select frequency and channel information,the input feature map is corrected to enable the model to focus more on the significant areas in the frequency and channel dimensions,while filtering out unnecessary information.Additionally,due to the special nature of the selection mechanism,compared to mainstream attention mechanisms,experiments on the Vox Celeb dataset show that the method proposed in this paper achieves an average error rate improvement of 5.1% with only 1/10 parameter usage of existing mainstream attention mechanisms.(2)Speaker verification loss function based on data uncertainty learning: This addresses the problem of performance degradation in speaker verification systems in noisy environments.Unlike the commonly used speaker verification system that represents speaker characteristics as point estimates in the latent space,this method proposes speech uncertainty learning,which provides a distribution estimate for speaker embedding.The mean of the distribution represents the most likely latent feature,which is used to filter out valid speaker information and suppress noise interference.Experiments on the Vox Celeb dataset show that compared to the most commonly used AAM-Softmax loss function in this task,the method proposed in this paper achieves an average error rate improvement of6.3%.(3)The current study involves the development and execution of a speaker verification system.Building upon the aforementioned research,a web-based speaker recognition verification system has been designed and implemented.This system supports users in voice recording and speaker verification.Through the comparison of the voice of the recorded speaker with that of the speaker seeking verification,the system can ascertain whether or not they originate from the same individual. |