Nowadays,people pay more and more attention to the way of identification through biometrics,and now it has developed into a hot topic,and the important branch of speaker identification has also entered people’s attention.Speaker recognition is easy to operate,low-cost,and more acceptable than other biometric methods.The purpose of this paper is to identify the speaker’s identity as quickly and accurately as possible,and reduce possible noise effects through speech enhancement.The main work is as follows:First,we preprocess the speech and extract features,and use the obtained speech features to generate the initial codebook by vector quantization,and use the LGB algorithm to find the best codebook after a lot of training.Then the tested speech is compared with the best codebook,and a speaker recognition system that can recognize the speaker’s identity is realized in this way.The system has nothing to do with the type of language spoken by the speaker and the content of the text during the recognition process,but only with the sound characteristics of the speaker.This is a textindependent system for identifying speakers through voiceprints.Secondly,we study a generalized loss calculation method of end-to-end speaker identification,which speeds up the training data of the whole system through batch processing,and uses the Softmax loss function to make the embedding vector belonging to the same speaker as close to its centroid as possible,so as to reduce the loss caused by the explosion of the number of users.The results show that the performance of the existing model is better than the traditional model.Finally,in order to reduce the influence of noise and other factors on recognition results,we propose a contrastive learning framework of wav2vec2.0 with mixed attention mechanism.In this hybrid attention model,location information is added to the original content-based attention model,and the generated attention vector is considered comprehensively to reduce the influence of similar features.We find that the improved framework does have better robustness to noise.In summary,based on speech enhancement technology,this paper implements a text-independent speaker recognition system,through which the speaker’s identity can be recognized.Experimental results show that the system performs well in both accuracy and robustness. |