End To End Speaker Recognition Under Noisy Environment

Posted on:2021-09-11

Degree:Master

Type:Thesis

Country:China

Candidate:Q L Zeng

Full Text:PDF

GTID:2518306128976699

Subject:Master of Engineering

Abstract/Summary:

Speaker recognition is a technique to determine the identity of a speaker by analyzing and extracting the characteristics of one or more speech signals.It is also called voiceprint recognition.Speaker recognition technology is one of the identity verification technologies with broad application scenarios after fingerprint recognition,face recognition and iris recognition.With its unique applicability,convenience and accuracy,and the characteristics of no body contact,it has become an important research hotspot in the field of speech.In the mid-1990 s,especially after the Gaussian mixture model was applied to the field of speaker recognition,speaker recognition technology has continued to attract researchers’ attention and has been greatly developed and improved.At present,the speaker recognition technology has a high level of recognition effect on pure speech.However,in practical production applications,there is still much room for improvement in terms of the robustness,transferability,and phrase recognition rate of speaker recognition systems.Many of the most common noises in real life scenes have become one of the important factors that affect the performance of speaker model recognition.Therefore,how to effectively improve the performance of speaker recognition system in real noise environment has become one of the most important research hotspots in the field of speech.The main contents of this dissertation are as follows:(1)Introduced basic knowledge of speaker recognition and main technical difficulties currently faced,analyzed the advantages and disadvantages of common algorithms of speaker recognition,selects the basic i-vector speaker recognition model and LSTM speaker recognition model to use part of the data in the aishell open source data for preliminary experimental comparison.(2)This dissertation mainly studies the calculation of speech signal-to-noise ratio and the detection of cut amplitude,and puts forward the methods of realizing the calculation of speech signal-to-noise ratio and the detection of cut amplitude.Based on C + +,the calculation of batch speech signal-to-noise ratio is realized and based on python,the batch audio clipping detection tool is implemented respectively.In training,we can filter out the audio with clipping through speech clipping detection to improve the training data quality of the model;in practical application scenarios,we can improve the input audio quality by audio signal-to-noise ratio calculation to classify the audio quality and voice noise reduction to improve the performance of the speaker recognition system.(3)Compared with traditional speaker recognition technologies(such as GMM-UBM,JFA,i-vector,etc.),this dissertation focuses on the end-to-end speaker recognition method under the deep learning framework.CNN-LSTM and Res Net-LSTM fusion network models based on end-to-end are designed respectively,and comparative experiments are conducted through different signal-to-noise ratio data sets.The experimental results show that proposed model has better recognition performance than the basic CNN-LSTM speaker recognition model on two open speech data sets,and further prove that the use of deeper residual network instead of convolution network can better extract the features of speaker spectrum.By selecting Triplet Loss and GE2 E Loss instead of the softmax cross entropy loss function in the original network structure,the network structure is improved.The experiment shows that GE2 E Loss function can further improve the recognition performance of the current network model.

Keywords/Search Tags:

speaker recognition, audio signal-to-noise ratio, CNN, LSTM

Related items

1	A stereo audio coder with a nearly constant signal-to-noise ratio
2	Modulation Recognition And Parameter Extraction Of Radar In Pulse Signal Based On Depth Learning
3	Research On The Discrimination Issue In Speaker Recognition
4	Neural Network-Based Speech Keyword Recognition Algorithm And Circuit Design For Low Signal-To-Noise Ratio
5	Research On Deep Learning Recognition Of Low Signal-To-noise Ratio And Few-shot Wireless Signals
6	Research On Technology Of Digital Audio Watermarking
7	Research On Technology Of Digital Audio Watermarking
8	Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Stream
9	The signal-to-noise ratio estimation in dispersive absorption spectrometry and new quantitative methods based on the signal-to-noise ratio theory
10	Research On Low Signal To Noise Ratio Signal Waveform Recovery Based On Chaos Theory