Font Size: a A A

Research On Automatic Detection Of Non-lexical Events In Spontaneous Speech

Posted on:2010-10-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y X LiFull Text:PDF
GTID:1118360302473774Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Non-lexical events (such as filled pause, laughter, applause) frequently occur in spontaneous speech, and they can indicate some outcomes of speaker's speech to some extent, such as emotional and mental status of speakers, talk's topic and atmosphere. Hence, non-lexical events detection can improve the performance of speech retrival system, and is also helpful for speech emotion recognition, speaker recognition and highlight extraction. We statistically analyze the differences of time-frequency features between different audio events (speech, filled pause, laughter, applause, and other sounds), and then propose a genetic algorithm based approach to simultaneous optimization of both feature subsets and HMM (Hidden Markov Model) parameters, and finally propose an effective approach for detecting non-lexical events in spontaneous speech. The main contributions of this thesis are as follows:(1) Based on the experimental dataset, we statistically analyze the differences between different audio events on duration, pitch (fundamental frequency), spectral stability, syllable repetition, and occurrence locations, which are the bases for non-lexical events detection.(2) In order to avoid the trouble in modeling applause and further decrease boundary errors of the detected applause, a rule-based fast approach is proposed for applause detection. The proposed approach can detect applause in meeting speech by only using the differences of duration and pitch between applause and non-applause events, without using any complex statistical models. Compared with the sliding window based approach adopted in the literature, F1 measure is improved by 3.62%, and about 35.78% computional time is saved, and boundary errors of the detected applause are decreased. In addition, the proposed approach can extract applause sub-segments from the mixed segments. On the other hand, the sliding window based approach can only decide whether the mixed segment is applause (if applause is the dominant) or non-applause (if non-applause is the dominant) instead of extracting applause sub-segments from the mixed segments.(3) Considering the fact that both the settings of both feature substes and HMM parameters have direct influence on discrimination of audio events, we propose a genetic algorithm based approach to simultaneous optimization of both feature subsets and HMM parameters. The experimental results show that the proposed approach achieves the highest discrimination accuracy of 90.2%, and obtains the optimal combination of both feature subsets and HMM parameters. Compared with other three approaches adopted in the literatures (i.e. single optimization of feature subsets, single optimization of HMM parameters, no optimization of both feature subsets and HMM parameters), the discrimination accuracy is improved by 5.05%, 3.53%, and 8.08%, respectively.(4) Based on the analyses of characteristic differences between different audio events, an effective approach is proposed for detecting non-lexical events in spontaneous speech. The longer applause events are first detected by using a rule-based method, and then a model-based method is used to detect other non-lexical events. The experimental results show that the proposed approach obtains average precision rate of 87.3%, average recall rate of 93.77%, and average F1-measure of 90.42% for detecting three non-lexical events (i.e. laughter, applause, and filled pause). When compared with the sliding window based approach, average F1-measure is improved by 7.52% for detecting the three non-lexical events. Moreover, it can more accurately determine the boundaries of non-lexical events in spontaneous speech.In conclusion, this thesis focuses on automatic discrimination and detection of non-lexical events in spontaneous speech and obtains some useful results, which is definitely helpful for further improving the performance of speech retrieval system.
Keywords/Search Tags:Non-lexical events detection, Genetic algorithm, Hidden Markov model, Speech retrieval, Spontaneous speech processing
PDF Full Text Request
Related items