Font Size: a A A

Research On An Algorithm For End-to-End Speaker Recognition And Its FPGA Implementation

Posted on:2023-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhaoFull Text:PDF
GTID:2558306911483154Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology,speaker recognition technology has achieved good recognition results in different application scenarios such as criminal investigation and banking insurance.The speaker recognition technology can extract the speaker identity information in the speech,and achieve the purpose of identifying the speaker by judging the similarity of the speaker identity information in different speeches.As a biometric recognition technology,improving the recognition accuracy is the main research content of speaker recognition technology.This thesis studies how to improve the accuracy of speaker recognition system in the terms of speech feature extraction and neural network model et al.To use fewer hardware resources and reduce system power consumption,the neural network model is quantized with low precision,and the key module design of the end-to-end speaker recognition system is optimized.Speaker recognition systems with delay and power consumption limits require to design specificial circuits.This thesis presents the following research on speaker recognition focusing on improving accuracy and FPGA implementation:1.A residual time delay neural network model named as Res TDNN for speaker recognition is proposed.In order to improve the recognition accuracy,the residual design is introduced to improve the neural network model’s ability to model speech features.Log-FBank feature with less computational complexity and better recognition accuracy is selected as the speech feature extraction method.The model achieves 2.48% equal error rate on the Vox Celeb1 test set.2.Residual time delay neural network model is quantized with low precision.In order to reduce the loss of recognition accuracy caused by model quantization,the Per-Channel strategy is to used to quantize the weight of the model,the moving average strategy is to used to calculate the parameter distribution,the batch normalization layer is fused during quantization aware training,and the quantization scheme of the model is optimized.The model parameters are quantized from 32 bits to 8 bits,which achieves 2.93% equal error rate.3.Essential modules of end-to-end speaker recognition system are designed and implemented.In order to use fewer hardware resources,the design of the pre-emphasis module,framing module,FFT module and logarithm modulein the speech feature extraction module is optimized,and the neural network accelerator is designed using the scheme of intra-layer parallel and inter-layer serial.The end-to-end speaker recognition system achieves 3.47% equal error rate in the speaker verification task,the system throughput reaches 164.44 GOPS under the 200 MHz clock,the delay for processing 3 s speech is 12.71 ms,and the total system power consumption and ratio of energy to efficiency reache 8.6 W and 19.121 GOPS/W,respectively.
Keywords/Search Tags:Speaker Recognition, Quantization Aware Training, Speech Feature Extraction, Hardware Acceleration
PDF Full Text Request
Related items