Research On An Algorithm For End-to-End Speaker Recognition And Its FPGA Implementation

Posted on:2023-01-24

Degree:Master

Type:Thesis

Country:China

Candidate:S Zhao

Full Text:PDF

GTID:2558306911483154

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the development of science and technology,speaker recognition technology has achieved good recognition results in different application scenarios such as criminal investigation and banking insurance.The speaker recognition technology can extract the speaker identity information in the speech,and achieve the purpose of identifying the speaker by judging the similarity of the speaker identity information in different speeches.As a biometric recognition technology,improving the recognition accuracy is the main research content of speaker recognition technology.This thesis studies how to improve the accuracy of speaker recognition system in the terms of speech feature extraction and neural network model et al.To use fewer hardware resources and reduce system power consumption,the neural network model is quantized with low precision,and the key module design of the end-to-end speaker recognition system is optimized.Speaker recognition systems with delay and power consumption limits require to design specificial circuits.This thesis presents the following research on speaker recognition focusing on improving accuracy and FPGA implementation:1.A residual time delay neural network model named as Res TDNN for speaker recognition is proposed.In order to improve the recognition accuracy,the residual design is introduced to improve the neural network model’s ability to model speech features.Log-FBank feature with less computational complexity and better recognition accuracy is selected as the speech feature extraction method.The model achieves 2.48% equal error rate on the Vox Celeb1 test set.2.Residual time delay neural network model is quantized with low precision.In order to reduce the loss of recognition accuracy caused by model quantization,the Per-Channel strategy is to used to quantize the weight of the model,the moving average strategy is to used to calculate the parameter distribution,the batch normalization layer is fused during quantization aware training,and the quantization scheme of the model is optimized.The model parameters are quantized from 32 bits to 8 bits,which achieves 2.93% equal error rate.3.Essential modules of end-to-end speaker recognition system are designed and implemented.In order to use fewer hardware resources,the design of the pre-emphasis module,framing module,FFT module and logarithm modulein the speech feature extraction module is optimized,and the neural network accelerator is designed using the scheme of intra-layer parallel and inter-layer serial.The end-to-end speaker recognition system achieves 3.47% equal error rate in the speaker verification task,the system throughput reaches 164.44 GOPS under the 200 MHz clock,the delay for processing 3 s speech is 12.71 ms,and the total system power consumption and ratio of energy to efficiency reache 8.6 W and 19.121 GOPS/W,respectively.

Keywords/Search Tags:

Speaker Recognition, Quantization Aware Training, Speech Feature Extraction, Hardware Acceleration

PDF Full Text Request

Related items

1	The Research Of Speaker Recognition
2	The Research Of Speaker Recognition Under Noisy Environment
3	Any Text Speaker Recognition System
4	The Research Of Front-end Processing Technology Based On The Speaker-independent Speech Recognition
5	Research Of Speaker-Recognition Technology On Vector Quantization
6	Research On Non-specific Speaker Speech Emotion Recognition Based On Deep Feature Extraction And Processing
7	Research On Speech Emotion Recognition Methods
8	The Research Of Feature Extraction Algorithm On The Speaker-Independent Speech Recognition
9	The Research And Application Of Text-Independent Speaker Recognition Technology
10	Research And Application On Simultaneous Recognition Of Both Speech And Speaker