Font Size: a A A

Key Speaker Identification For Cooperative Problem Solving Scenarios

Posted on:2024-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q J GaoFull Text:PDF
GTID:2568307091991219Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the increasing complexity and uncertainty of cooperation problems,the ability of interpersonal communication and communication becomes more and more important.Many problems cannot be solved independently by one person using a single skill,but need to be dealt with by many people.The ability of cooperative problem solving is becoming one of the necessary skills in people’s life and work.It also puts forward new requirements on how to measure the ability of participants in the process of cooperative problem solving.In the first study,the audio data generated in the process of solving cooperative problems are used to detect speech activity and speaker change points based on the bidirectional long and short time memory network with the help of speaker recognition technology.Then,the overlapping speech is detected and segmented by the variable decibels Bayesian hidden Markov model.Finally,the Hidden Markov Model(HMM)model clustering algorithm is used to cluster speakers,which solves the problem of "who said what when" in conversational speech and lays a foundation for subsequent key speaker recognition.The HMM model clustering algorithm used in this thesis is compared with K-Means clustering algorithm using orthogonal distance and Spectral Clustering(SC)algorithm using oblique clustering,and the10 minutes,20 minutes and complete audio files are trained respectively.It is found that the time has an important effect on the separation error rate of speaker recognition system,and the separation error rate shows a decreasing trend.The average separation Error Rate(DER)of speaker segmentation clustering 10 minutes before interception is 1.98 times that of training with complete audio,and the accuracy rate decreases by 6.38%.In addition,by comparing the three clustering algorithms,we can see that the DER mean value of the distance-based K-Means algorithm and the SC algorithm on the complete audio is 2.18 times and 2 times of the HMM model-based clustering algorithm,respectively,which indicates that the HMM model-based algorithm used in this thesis has better robustness in recognition performance compared with the traditional distance-based clustering algorithm.After studying the results of two pairs of clustering,10 speaker features representing each speaker are extracted,and regularized Logistic,Random Forest(RF)and XGBoost models are used to identify key speakers in multi-person conversation speech.It is concluded that the total speaking time is the most important factor affecting whether the speaker is a key speaker,and the average speaking time,the number of times of speaking,and the maximum duration of a single round that represent the speaker’s right to speak play an important role in the identification of key speakers;The average interval,the maximum interval,and the total interval between two adjacent rounds of the speaker,which represent the degree of speaker participation,also affect key speaker identification.In addition,comparing the regularized Logistic,RF and XGBoost models,it can be seen that the accuracy of the three models in identifying key speakers and other speakers is relatively good,and the best performance is the XGBoost model,with accuracy and F1_score of 98.82% and 97.30 respectively %;RF performance is next,accuracy and F1_score are 96.94% and 92.74% respectively;regularized Logistic performance is the worst,accuracy and F1_score are 93.45% and 87.23%respectively.There is little difference in the performance of the three models in terms of accuracy,but there is a big difference in the performance of the three models in F1_score.The performance of XGBoost,which performs best,is 11.54% higher than that of regularized Logistic,and 4.92% higher than that of RF.
Keywords/Search Tags:Cooperative problem solving, Multi-speaker recognition, Key speaker recognition
PDF Full Text Request
Related items