Font Size: a A A

Research Of Speaker Identification Based On Linear Prediction Residual

Posted on:2021-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:L XuFull Text:PDF
GTID:2518306569994929Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
Speaker recognition is a biometric authentication technology for speaker identification based on human speech characteristics.It has the advantages of convenience,security,and accuracy,which is currently widely used in fields such as national defense,finance,and public security.Speaker recognition is mainly composed of speech feature extraction and pattern matching recognition.Speech feature extraction is the core of the entire speaker recognition system.Whether the extracted features can fully reflect the identity of the speaker will directly affect the performance of the entire system.This paper is based on the residual signal generated by Linear Prediction Coding(LPC)for speech feature extraction,and combined with the text-independent speaker recognition system constructed by the Long Short-Term Memory(LSTM)recurrent neural network for feature performance testing,so as to explore the speech feature that can fully characterize the speaker's identity information.The paper proposed feature extraction based on LPC residual signal.Linear prediction coefficients can characterize human vocal tract information,which is a commonly used feature in current speaker recognition tasks,but the accompanying LPC residual signal is often ignored.This paper analyzes the time-frequency domain of the LPC residual signal and finds that the residual signal still contains information that can reflect the identity of the speaker.The paper designs a feature extraction algorithm based on the residual signal,which is mainly composed of two parts: preprocessing and feature parameter extraction.Based on the residual signal,it extracts the current mainstream feature parameters in the field of speaker recognition,including Linear Prediction Coefficients(LPC)and Linear Prediction Cepstral coefficient(LPCC)and Mel Frequency Cepstral Coefficient(MFCC);In addition,in order to describe the distribution characteristics of the residual signal,this paper also extracts the second moment and third moment of the residual signal.The LSTM network can learn the long-term dependent information features in the context,and better reflect the longterm changes in the short-term features of speech.Therefore,this paper builds a speaker recognition system based on the LSTM network to test the performance of the extracted features,using an end-to-end loss function for network training,this training method increases the training speed and reduces the complexity of the model.The paper designed multiple sets of comparative experiments to analyze the performance of the features extracted based on the LPC residual signal.First test the 15-dimensional LPC feature's recognition rate based on the speaker's original speech signal extraction,and add 2-dimensional second-order and third-order moments based on the LPC residual signal to generate combined features.Compared with 15-dimensional LPC features,the average recognition rate of the combined features is increased by about 5%,which verifies that the LPC residual signal does contain residual information that can characterize the speaker's identity.In addition,this paper also directly performs feature extraction and feature combination based on the LPC residual signal,and compares it with the feature extracted based on the original speech signal;the results show that the average recognition rate of combined feature based on the LPC residual signal extraction(13-dimensional MFCC combined with 2-dimensional second moment and the third-order moment)reached 94.083%,which is higher than the average recognition rate of MFCC features extracted directly based on the original speech signal.The second-order moment and third-order moment of the LPC residual are added to the MFCC feature extracted based on the original speech signal and the average recognition rate of the recognition system is increased by about 1%.Therefore,this article believes that the LPC residual signal contains the speaker's speech feature information and provides a new idea for feature extraction based on LPC.The feature extraction can be directly based on the LPC residual,and the features extracted from it can be used as a supplement to the current mainstream feature parameters.
Keywords/Search Tags:speaker recognition, linear prediction residual, feature extraction, end-to-end loss function
PDF Full Text Request
Related items