Font Size: a A A

Analysis Of Speaker Roles For Multi-speaker Conversational Speech

Posted on:2016-09-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:W LiFull Text:PDF
GTID:1108330479495086Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the development of speech processing technology,the attention paid by researchers has been gradually shifting from single monologue speech processing to multi-speaker speech processing. Some important cues exist in multi-speaker speech, which are not in single monologue speech, e.g. number of speaker, speaker role, key degree of speaker, overlapped speech. All of these cues are important base for semantic understanding and retrieval of multi-speaker speech. How to effectively analyze huge mass of multi-speaker speech and extract these cues from multi-speaker speech, have become a hot research topic in the field of speech processing.We focus on processing multi-speaker speech, e.g. multi-participant discussion speech, summits speech, press conference speech of leaders and lecture speech, and primarily investigate some important problems concerning speaker role analysis, such as spectral clustering of speakers, key speaker estimation, key speaker verification, speaker role clustering, overlapped speech detection, with the aim to extracting more speaker information from huge mass of multi-speaker speech and expanding the function of the current multi-participant speech processing system. The main contributions of this dissertation are as follows:(1)An algorithm based on model distance for spectral speaker clustering is proposed to overcome the shortcoming of general spectral clustering algorithm in describing the distribution of signal source.First,an Universal Background Model(UBM) is created with a large quantity of independent speakers. Then,Gaussian Mixture Model(GMM) is trained from the UBM for every speech segment. Finally, the probability distance between the GMM of every speech segment is used to build affinity matrix, and speaker spectral clustering is done on the affinity matrix. Experimental results based evaluated on news and conference data sets show that an average of 6.38% improvements in F measure are obtained in comparison with the feature vector distance based algorithm.In addition,the proposed algorithm is 11.72 times faster than the previous method.(2)Based on(1), the method for estimating key speaker in meeting speech based on multiple features optimization is proposed through analyzing each speaker’s speech after speaker clustering. First, each feature is defined and their differences between key speaker and other speakers are analyzed. Then, four effective audio features are extracted and a decision function of multiple feature weighting is generated for estimating key speaker in meeting speech. At last, the genetic algorithm is used to optimize these coefficients of feature weighting. This method does not have to train complex classifier, and effectively estimates key speaker in meeting speech. The methods are evaluated on three different meeting speech datasets. Experimental results show that the proposed optimization method obtains average accuracy of 93.3% for estimating key speaker, and gains average accuracy improvement by 9.7% and 4.1% compared with the previous method and the feature weighting method without optimization, respectively.(3) Based on the results of both(1) and(2), we further refine the speech segments of key speaker, with the aim to removing the utterances of non-key speakers which are incorrectly recognized as key speaker and get back the utterances of key speaker which are wrongly identified as non-key speakers in the results of(2). Firstly, deep speaker vector(DSV) is proposed and constructed. Secondly, DSV is used for speaker verification of the same source. The procedure of refining the speech segments of key speaker consist of three steps. First, shallow feature is used for seeking the high probability key speaker utterances in key speaker. Second, the key speaker utterances are used for deep learning to train deep feature and DSV. Third, DSV is used for verification key speaker. Experimental results show that the FAR and FRR of key speaker verification are 1.28% and 4.79%, respectively. The proposed method not only gets back the utterances of key speaker utterances, but also removes the utterances of non- key speakers.(4) Based on the results of(2), in order to effectively analyze the number of speaker roles and the speech of different speaker roles in different multi-participant conversational speech, the features of speaker roles are first defined and then extracted based on the results of speaker clustering. The graph model is established using clustering meeting data and other meeting data, and the similarity of each speaker sample is weighed by the geodesic distance in this graph model, and then the performance of the unsupervised clustering is improved. To overcome the disadvantage of hierarchical clustering, one clustering algorithm of speaker role is proposed for controlling class mergence. Finally, the proposed method is tested on four different meeting corpora. The experimental results show that the problem of role clustering can be effectively solved. Thus this work lays a solid foundation for further speaker searching and extraction of high-level speech information.(5) In order to remove the negative impact in speaker division clustering by overlapping speech, and overcome the disadvantages of traditional features for overlapping speech detection, one method for detecting overlapping speech is proposed based on fractal dimension. The extraction methods of fractal dimension characteristics of short-time speech are summarized, and differences of fractal dimension between overlapping speech and single speech are analyzed. The experimental results show that the combination of both Mel-Frequency Cepstral Coefficients and fractal dimension can achieve 81% of discrimination correct rate, and the results are better than that obtained by other traditional features.To sum up, the multi-conversational speech is chosen as the processing object, and these problems concerning speaker role analysis, such as spectral clustering of speaker, estimation and homologous confirmation of key speakers, speaker role clustering, and overlapping speech detection are deeply investigated in this dissertation. Some useful research results are obtained in our works, which lay a solid foundation for further improving the performance of multi-participant conversational speech analysis and speaker retrieval.
Keywords/Search Tags:Speaker clustering, Spectral clustering, Key speaker, Verification of the same source, Overlapping speech
PDF Full Text Request
Related items