Font Size: a A A

Speaker Segmentation For Mixed Speech In Multi-person Conversations

Posted on:2020-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y N LiFull Text:PDF
GTID:2438330623964260Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The research of speaker segmentation technology is one of the hot spots in the field of speech signal processing in recent years.Although the current speaker segmentation technology has better segmented performance under the condition of long utterance,the performance drops dramatically in the actual multi-person conversation scenario,because the speakers tend to alternate frequently and pronouncing time is short.The speaker segmentation technology under the condition of short utterance is studied based on the Gaussian mixture model and multi-scale analysis in this paper.The specific research contents are as follows:(1)The speaker recognition technology of short utterance is studied to improve the speaker recognition rate under the condition of short utterance.The Mel-frequency cepstral coefficient and the first-order differential Mel-frequency cepstral coefficient are combined to constitute the feature vector which can fully reflect the speakers' personality,and normalization is utilized to equalize the influence of different characteristic parameters on models.In addition,the Gaussian mixture model is applied to train the speakers' models,match and decision the voice which needs to be recognized.(2)There is obvious silent segment as interval between most of voices which belong to two speakers when speakers switch.In order to detect reliable silent segments,the endpoint detection technology is studied,and the existing voice libraries are used to compare the traditional double threshold endpoint detection method and multi-threshold endpoint detection method.(3)On the basis of improving the speaker recognition rate of short utterance,the speaker segmentation technology under the condition of short utterance is studied.Using the continuous voiced segments in the mixed voice as the clue,the speaker switching points are found based on multi-scale analysis.And the frame-division probability is adopted to optimize the effect of speaker segmentation in the segmented process.Chinese mixed voices and English mixed voices are used to test the speaker segmentation system one after another.The experimental results show that for multi-person mixed voices consisting of a series of short utterance(within 3 seconds),the speaker segmentation method based on Gaussian mixture model and multi-scale analysis achieves good segmentation effect under the condition of short utterance,and the method is language independent.Even if the speakers tend to alternate frequently,or there is no significant interval between two speakers' voices,or someone's voice is very short,the method of this paper can also correctly detect the switching points of the speakers in the mixed voice.
Keywords/Search Tags:Speaker segmentation, Gaussian mixture model, Multi-scale analysis, Short utterance, Speaker recognition, Endpoint detection
PDF Full Text Request
Related items