Speaker Diarization Based On Deep Neural Network With Hybrid Structural

Posted on:2023-03-30

Degree:Master

Type:Thesis

Country:China

Candidate:P Tian

Full Text:PDF

GTID:2568307031488194

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Speech processing models such as speech recognition and speaker verification are widely used in intelligent conference and customer service conversation processing scenarios due to their low cost and high efficiency.However,suppose the unprocessed audio is manipulated.In that case,other factors such as the environment or noise will affect the recognition effect.The phenomenon of continuous speech by different people will limit the speech model.These adverse effects will lead to the performance of the speech processing model decline.The speaker diarization task can generate corresponding more structured text information according to the audio signal.Other speech processing models can process the information of each speaker in its corresponding utterance time.This thesis deeply studies each speaker diarization module and designs a speaker diarization model based on a hybrid deep neural network.The specific work as follows:1.Aiming at the problem that the speech segment obtained by speech segmentation needs to have only one speaker information,taking voice signals and spectrograms as objects,this thesis designs a bidirectional long-term and short-term memory neural network to build a speech segmentation model,and uses the time information of speech signals to establish dependencies in the time context.The speech segment was segmented according to the classification consequence.Compared to the energy-based active speech detection model,the false alarm and missed detection rates were reduced by 2.42% and 1.2%,respectively,and at the same time,the speaker discrepancy analysis error rate was reduced by an average of 5%.2.Aiming at the low quality of the speaker’s feature of voiceprint vector,a model based on a deep residual alternating convolutional neural network was constructed,the residual connection was used to utilize the original information,the problem of gradient disappearance/explosion was avoided,and the training effect of the model was enhanced.At the same time,compared with the speaker diarization models based on Res Net,Res Ne Xt,X-vector and other networks,local and global attention mechanisms are used to perform feature recalibration,which improves the quality of speaker voiceprint feature vectors and reduces the purpose of speaker difference analysis error rate.In addition,the lightweight design was added in the model construction process,so that the number of model parameters reached 7.61 million,and the final diarization error rate was 4.12% on the Voxconverse dataset and 7.34% on the AMI dataset.3.In order to solve the problem of how to obtain the appropriate number of speaker categories by clustering without prior knowledge of the number of speakers,affinity propagation clustering is introduced into the system,and a complete speaker difference analysis model is constructed by combining the Bi-LSTM-based speech segmentation model and the deep residual alternating convolutional neural network..Compared with hierarchical clustering or spectral clustering,the error rate is reduced by1.1% on average,and it breaks through the limitation of the prior knowledge of the number of classifications and realizes the clustering of the number of random speakers.

Keywords/Search Tags:

deep learning, speech processing, speaker diarization, attention mechanism

PDF Full Text Request

Related items

1	Research On Speaker Diarization Based On Deep Learning
2	Research On Speaker Diarization In Multi-person Conversation Scenarios
3	Research On Speaker Diarization In Multi-person Scenarios
4	Design And Implementation Of Speaker Diarization System
5	Speaker Extraction And Verification Based On Deep Learning
6	Research Of Robust Speaker Verification Baesd On Deep Learning
7	Research On Speech Preprocessing Of Speech Recognition For Multi-talker Conversations In Complex Acoustic Environments
8	Research On Voceprint Signal Clustering Algorithm For Speaker Segmentation
9	The Study Of Speaker Diarization Based On Factor Analysis
10	The Modeling Research In Speaker Diarization