Font Size: a A A

Research And Implementation Of Key Technology In Speaker Diarization System

Posted on:2021-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y H CongFull Text:PDF
GTID:2518306308973069Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Speaker diarization is an important technique for voice signal processing to solve the problem of "who,speak,when,".Speaker diarization technology is mainly used in conference speech recording classification,speech recognition pre-processing,voice category detection,speaker recognition and so on,so it has important research significance.Previously,the speaker diarization algorithm mainly focused on non-overlapping speaker scenes of meeting recording,call recording and other simple voice scenes,has achieved good performance.However,in the complex "cocktail party" scene,the problem of the speaker diarization in the case of multiple speakers speaking at the same time(overlap),is still one of the difficulties in the study of the diarization of speakers today.This paper proposes an improved algorithm for the speaker change probability of uis-rnn,and proposes to use the number of speakers estimation method,improve the decision method of uis-rnn,increase the clustering segmentation(resegment)part,and reduce the clustering estimation error.Finally,based on uis-rnn,this paper implements the speaker clustering system in the overlap scenario.In the task of estimating the number of speakers,the above two needs must be met at the same time:first,to estimate the number of speakers of non-fixed-length voice data,and second,to detect each fixed-length short voice fragment in the number of speakers speaking at the same time,i.e.overlap detection.Therefore,another contribution of this paper is to propose a method to estimate the number of speakers to support non-fixed-length voice data input,a method of estimating the number of speakers based on the Structure of the GST model.Compared with the current advanced CRNN method count-net,the input of variable-length data is supported,and a lower MAE value is obtained if the number of speakers is greater than or equal to 5.In addition,the results show that The proposed number of speakers based on the GST model structure has obtained a MAE error of less than 0.2 at 240ms speech length,and the average MAE error of less than 0.4 can be realized under the variable length speech data,which proves the validity of the estimated number of speakers for non-fixed-length voice data.The experimental results in this paper show that after improving the method of speaker change probability estimation,the speaker diarization results are effectively improved,and DER drops by 2.6%.After uis-rnn increased the resegment mechanism,DER dropped by about 6%,achieving a DER result of 6.18%in non-overlap scenarios,and the baseline system obtained 14.84%DER.In addition,the speaker diarization method proposed in this paper can effectively achieve multi-label clustering of speakers in the case of overlap.Get 9.76%DER on manually synthesized overlap speech data.
Keywords/Search Tags:Speaker Diarization, Cocktail Party, Estimates the Number of Concurrent Speakers
PDF Full Text Request
Related items