Research And Implementation Of Key Technology In Speaker Diarization System

Posted on:2021-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Cong

Full Text:PDF

GTID:2518306308973069

Subject:Electronics and Communications Engineering

Abstract/Summary:

Speaker diarization is an important technique for voice signal processing to solve the problem of "who,speak,when,".Speaker diarization technology is mainly used in conference speech recording classification,speech recognition pre-processing,voice category detection,speaker recognition and so on,so it has important research significance.Previously,the speaker diarization algorithm mainly focused on non-overlapping speaker scenes of meeting recording,call recording and other simple voice scenes,has achieved good performance.However,in the complex "cocktail party" scene,the problem of the speaker diarization in the case of multiple speakers speaking at the same time(overlap),is still one of the difficulties in the study of the diarization of speakers today.This paper proposes an improved algorithm for the speaker change probability of uis-rnn,and proposes to use the number of speakers estimation method,improve the decision method of uis-rnn,increase the clustering segmentation(resegment)part,and reduce the clustering estimation error.Finally,based on uis-rnn,this paper implements the speaker clustering system in the overlap scenario.In the task of estimating the number of speakers,the above two needs must be met at the same time:first,to estimate the number of speakers of non-fixed-length voice data,and second,to detect each fixed-length short voice fragment in the number of speakers speaking at the same time,i.e.overlap detection.Therefore,another contribution of this paper is to propose a method to estimate the number of speakers to support non-fixed-length voice data input,a method of estimating the number of speakers based on the Structure of the GST model.Compared with the current advanced CRNN method count-net,the input of variable-length data is supported,and a lower MAE value is obtained if the number of speakers is greater than or equal to 5.In addition,the results show that The proposed number of speakers based on the GST model structure has obtained a MAE error of less than 0.2 at 240ms speech length,and the average MAE error of less than 0.4 can be realized under the variable length speech data,which proves the validity of the estimated number of speakers for non-fixed-length voice data.The experimental results in this paper show that after improving the method of speaker change probability estimation,the speaker diarization results are effectively improved,and DER drops by 2.6%.After uis-rnn increased the resegment mechanism,DER dropped by about 6%,achieving a DER result of 6.18%in non-overlap scenarios,and the baseline system obtained 14.84%DER.In addition,the speaker diarization method proposed in this paper can effectively achieve multi-label clustering of speakers in the case of overlap.Get 9.76%DER on manually synthesized overlap speech data.

Keywords/Search Tags:

Speaker Diarization, Cocktail Party, Estimates the Number of Concurrent Speakers

Related items

1	Design And Implementation Of Speaker Diarization System
2	Research On Speaker Diarization In Multi-person Scenarios
3	Research On Speaker Diarization Based On Microphone Array
4	Research On Non-concurrent Speaker Separation Technology For Corpus Acquisition System
5	Research On Speaker Diarization Based On Deep Learning
6	Research On Speaker Diarization In Multi-person Conversation Scenarios
7	Speaker Diarization: Current Limitations and New Directions
8	Speaker Diarization Based On Deep Neural Network With Hybrid Structural
9	The Modeling Research In Speaker Diarization
10	A Study On Speaker Diarization Based On Multiple Features